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UNRELIABILITY OF GROUP TEST PROFILES’ 
FLORENCE T. SNODGRASS 


University of New Brunswick 


The literature on trait variability within the individual is slight 
compared with that on individual differences. However, with the 
current interest in profiles and patterns, there is likely to be more 
attention paid to the subject in the future. Formerly, the emphasis 
in measurement was on the level of performance, but the mounting 
attention to profiles has broadened the focus to include, also, the 
pattern of performance. This statement applies to the use made of 
both group and individual test results for purposes of guidance, 
diagnosis and prediction. 

The wide range of individual variability reported by Hull? was 
corroborated by the later investigations of Foran, Lillie and 
O’Leary,’ and of Stout.‘ And other studies, only incidentally con- 
cerned with extent of variability, have all concurred with the 
findings of these three investigations. The variability of subtest 
scores in respect to the individual’s mean intelligence test score is 
called unevenness. This variability or unevenness is described 
numerically by some type of differential score and is represented 
graphically by a profile. Schlesser’s study® is the only one, to the 





1A summary of a dissertation prepared under the direction of Professor 
J. W. Tilton and submitted in partial fulfillment of the requirements for 
the degree of Doctor of Philosophy at Yale University, June, 1949. 

2Clark L. Hull, Aptitude Testing. Yonkers-on-Hudson N. Y.: World 
Book Company, 1928, p. 47. 

’'T. G. Foran, Gerald A. Lillie and Charles E. O’Leary, A Study of Trait 
Variability. Educational Research Bulletins, Washington, D.C.: The 
Catholic University Press, 1928, p. 23. 

4H. G. Stout, ‘Variations of normal children,”’ Journal of Experimental 
Education, 6: 84-100, September, 1937. 

5 George E. Schlesser, ‘“Development of special abilities at the junior- 
high age,’’ Journal of Educational Research, 40: 39-51, September, 1946. 
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writer’s knowledge, which has dealt with the reliability of this 
profile. In comparing the reliabilities of ability scores with the 
reliabilities of the differential or unevenness scores obtained from 
an achievement test battery, Schlesser found that even in the most 
reliably tested ability, slightly less than twenty per cent of the 
trait differences could be said to have significance. 

Little work has been done to describe how stable the profile is 
from one test performance to the next. There may be much ir- 
regularity in the effort put forth by individual pupils in taking a 
group intelligence test. Some may work hard on one subtest and 
let up on the next; others may work consistently throughout the 
test battery. Such irregularity of effort would modify the shape of 
the profile essentially caused by trait differences. These possibili- 
ties raise two questions: 

1) To what extent is the shape of the profile related to the order 
of subtests? 

2) Is the shape of the profile related to the interval between 
subtests? 

The relation between the extent of unevenness and educational 
achievement has been studied. DeVoss® found only slight and in- 
significant differences between the unevenness of abilities of gifted 
and unselected children. Gray’ found individual variability in edu- 
cational achievement somewhat greater in the dull group than in 
either the average or bright group, but agreed with DeVoss in 
finding no true difference between the average and the bright. 
Measuring achievement by college marks instead of a standardized 
test, Rausch*® found students with the greatest variability to have 
the poorest achievement in college. Pressey® found no evidence of 
any real difference in the amount of unevenness between the dull, 
average and bright. Merrill,!° on the other hand, concluded that 





¢ LL. M. Terman, Editor, Genetic Studies of Genius. Stanford University, 
Stanford University Press, 1925, chapter 12. 

7 Susan W. Gray, ‘The relation of individual variability to intelligence,’’ 
The Journal of Educational Psychology, 35: 201-210, April, 1944. 

8 Oscar P. Rausch, ‘“The effects of individual variability on achieve- 
ment,’’ The Journal of Educational Psychology, 39: 469-478, December, 1948. 

°S. L. Pressey, ‘‘Irregularity on a Binet examination as a measure of its 
reliability,’’ Psychological Clinic, 12: 236-240, May, 1919. 

10 Maud A. Merrill, On the Relation of Intelligence to Achievement in the 
Case of Mentally Retarded Children. Comparative Psychology Monographs, 


1924, vol. 2, no. 10. 
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unevenness was greatest for the superior group, less for the dull 
and least for the normal. Brown" found about equal unevenness 
in three special abilities for the dull and bright. Woodrow,” hold- 
ing mental age constant, found that unevenness was at a mini- 
mum for children of average IQ and that children of either low or 
high IQ have greater unevenness than those of average. He also 
found a tendency, though not one with a high degree of reliability, 
for the children of low IQ to be more uneven than those of high 
IQ. Tilton" disagreed with these conclusions after making a criti- 
cal analysis of Woodrow’s data. From data of his own, Tilton 
made the same inferences that resulted from his re-examination of 
Woodrow’s data; namely, an inverse relationship between uneven- 
ness and IQ. In a later study Tilton” found that unevenness was 
also directly related to progress in school. Pupils accelerated for 
their mental age were more uneven than the at-grade group, and 
the retarded pupils were less uneven than the at-grade. These 
findings led Tilton to suggest that these relationships might be due 
to unreliability and to raise these questions: Is dullness associated 
with more than average unreliability in taking group intelligence 
tests? Is the greater unevenness of the accelerated group at school 
a function of unreliability? Another direct relationship which 
Woodrow" reported was between increased unevenness and ad- 
vancing mental age. Might this direct relationship, also, be a func- 
tion of unreliability? 

In light of the issues that have been raised in the preceding para- 
graphs, this study was undertaken to answer the following ques- 
tions: 

1) How reliable are profiles? 

2) Do slight variations from standard procedure in the adminis- 
tration of tests alter the profiles? 

3) Are individual differences in unevenness due to individual 





11 A.W. Brown, The Unevenness of the Abilities of Dull and of Bright 
Children. New York: Teachers College Contributions to Education, No. 
222, 1926. 

12 Herbert Woodrow, ‘‘Mental unevenness and brightness,’’ Journal of 
Educational Psychology, 19: 289-302, May, 1928. 

8 J. W. Tilton, ‘‘The relation between IQ and trait difference as measured 
by group intelligence tests,’”’ Journal of Educational Psychology, 38: 343- 
352, October, 1947. 

144 J. W. Tilton, unpublished manuscript, 1948. 

16 Herbert Woodrow, op. cit., p. 294. 
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differences in unreliability? If so (1) Is the greater unevenness of 
the dull due to their greater unreliability? (2) Is the increased un- 
evenness in advancing mental age due to unreliability? (3) And is 
the greater unevenness of the accelerated at school due to unre- 
liability? 


PROCEDURE 


The experiment was conducted in the junior high schools of New 
Haven, Connecticut. The test used was the Terman-McNemar 
Test of Mental Ability, Form C, which is a battery of seven sub- 
tests. It was administered in all the schools in the usual manner. 
A week later it was repeated in the same way to approximately 
one-third of the population, to be known in this study as the 
standardized group. After the same interval it was administered, 
by reversing the order of the subtests, to another third to be known 
as the separated group, by spacing each subtest on consecutive 
school days. The interval of one week was thought to be short 
enough to obviate marked gains due to learning, and long enough 
to minimize practice effects. The experiment was so designed in 
order to determine if the unreliability affecting profiles is changed 
by these two variants in standardized administration. 

Since Woodrow'* had found a marked increase in unevenness 
with advancing mental age, it was necessary in this study to hold 
mental age constant in order to keep this relationship under con- 
trol. The range of mental age levels selected for the sample was 
dependent on those levels for which the test was adequate. When 
the distributions of the various mental age groups were tested for 
skewness, the 12:0-12:5 was the lowest age group and the 14:6-— 
14:11, the highest found to be without skewness. Accordingly, the 
sample selected included the six half-year mental age groups from 
12:0—14:11, inclusive, which numbered 1138 cases. At each of these 
levels each subtest distributed the members of the group satis- 
factorily and without skewness. 

Inasmuch as it was necessary to express each subtest score in 
comparable units for each of the six mental age groups, all scores 
were converted into sigma scores. For this purpose, the total sam- 
ple was divided both on the basis of sex and half-year mental age 
levels. The terms unevenness score, differential score and trait 





16 op. cit. 
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difference score are used interchangeably to mean the difference 
between the individual’s average sigma score and his sigma score 
on a particular subtest. Hence the average unevenness score for 
each individual was the mean deviation of that individual’s seven 
scores. 

The unreliability referred to in this study is the inconsistency of 
test performance from one administration of the test to the next, 
using the same form. To measure the unreliability of each pupil’s 
performance a score known and defined in this study as a dis- 
crepancy or ‘D’ score was adopted. D scores were derived in two 
different ways: one set from raw scores, one from sigma scores. A 
raw D score was defined as the sum of the differences, after cor- 
rection was made for practice effects, between the raw scores on 
the first and second administration of the test. The D scores were 
derived from sigma scores by the same method except that there 
was no necessity to correct for practice effects. It may be noted 
that the direction or sign of the difference is of no importance in 
measuring the individual’s unreliability. The larger the individual’s 
D score, the greater is his unreliability in taking the test. 


THE RELIABILITY OF THE PROFILE 


Since the profile may be described with unevenness scores, its 
reliability can be measured by them. It is of course recognized that 
part of the profile is due to the unreliability of measurement. 
This fact is often ignored and the profile is misused because too 
much faith is put in its precision. The question arises: how high a 
peak or how low a dip must there be in a profile before we can 
be certain that it indicates an ability or disability? The reliabilities 
of the unevenness scores, which measure trait differences in the 
individual, are compared with the reliabilities of corresponding 
scores from which the profiles were made, as shown in Table I. 

The first column in this table gives the reliability of the abilities 
as measured by the seven subtests from the two administrations 
of the test battery. An ability will be defined as the standard score 
a pupil makes on test 1, 2, 3, etc. The average of each pupil’s 
standard scores will be called the pupil’s general level of ability. 
Special ability or disability will be defined as his standard score 
on a given test minus his general level or average. Thus the peaks 
or dips of the profile above and below the general level will be 
graphic representations of special abilities and disabilities, re- 
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TABLE I.—CoMPARISON OF THE RELIABILITIES OF TRAIT AND 
TRAIT DIFFERENCE SCORES FOR THE STANDARDIZED GROUP 








oe : Standard Error of 
Subtest _| Reliabilities of Abilities | BeUspistre.cey it | Measurement of Trait 

1 .64 .62 .683 
2 .32 .29 ‘860 
3 72 .50 . 566 
4 54 43 . 782 
5 78 .61 .657 
6 47 34 . 852 
7 . 66 .51 . 783 

av. z = .707126| av. z = .522644 

av. r = .605 av. r = .480 














spectively. And the term ‘trait difference’ heading the second col- 
umn will be used to mean either a special ability or disability. 
(The reader is reminded of the interchangeable usage in this article 
of the terms trait difference score, differential score and uneven- 
ness ‘U’ score to measure these peaks and dips above and below 
the general level.) Here also the reliabilities from the two test 
administrations were calculated. 

Schlesser!’ reported a similar comparison of reliabilities of abil- 
ities and trait differences. Using the severe test of three standard 
errors of measurement from the mean before he admitted a reliable 
differential score, he found that, even in the most reliably tested 
area, only about twenty per cent of trait differences were signifi- 
cant. Applying this same criterion for significance to the present 
data, only about five per cent of the trait differences in the most 
reliably tested area (subtest 5) are significant. More general prac- 
tice is to regard a critical ratio of 1.96 as being sufficiently high to 
indicate significance. An examination of the present data with this 
adjusted criterion, indicated that twenty-five per cent of the peaks 
and dips are significant in the most reliably tested area (subtest 5). 
In the least reliably tested area (subtest 2), ten per cent are signifi- 
cant. 

From these findings it is concluded that most of the irregularities 
of a profile, as measured by the test used, are due to errors in meas- 





17 G. E. Schlesser, op. cit., p. 41. 
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urement. Profile differentiations are reliable only about twenty- 
five per cent of the time in the most reliably tested area. 


THE RELATIONS OF UNRELIABILITY OF PROFILES TO TEST 
ADMINISTRATION 


The extent to which the shape of the profile was related to the 
order of subtests was investigated by comparing the standardized 
group with the reversed group. The practice effects as indicated 
by raw scores were compared for these two groups and none of the 
differences were significant. Furthermore, the range and standard 
deviation on corresponding subtests for the two groups were prac- 
tically the same. Additional evidence of the similarity in shape of the 
profiles of these two groups was found by comparing their average 
scores (measure of individual’s unreliability or change in taking the 
test from one administration to the next) computed both on the 
‘ basis of raw scores and sigma scores. In both cases the standard 
error of the difference far exceeded the obtained difference between 
the two groups. A comparison made of the reliabilities of the trait 
difference scores on each subtest showed no significant difference 
for the average of the two groups, nor did it show any consistent 
direction in the differences between the reliabilities of correspond- 
ing subtests. Surprisingly enough, subtest 4, the relative position 
of which was not changed in either group, showed the greatest 
amount of change. 

A final comparison of these two groups was made of the relia- 
bilities of their average U; and U:2 scores as shown in Table II. 
The unevenness scores from the two administrations of each sub- 
test were correlated. Then these r’s were converted into z values 
for averaging and the average z’s converted back to r’s. It will be 
noted that the difference on the whole between the two groups 
was not significant, nor was there any consistency in the direction 
of the differences between coefficients of corresponding subtests, 
as Table II shows. The standardized and reversed groups were 
shown to be similar by all the criteria applied: average gain, meas- 
ured both in terms of central tendency and spread; discrepancy 
or change from test to retest; reliability of both subtest scores and 
unevenness scores. The reliabilities of the subtests showed a criss- 





18 The subscripts refer to the U scores obtained from the first and second 
administration. 

















The Journal of Educational Psychology 


TABLE II.—CoMPARISON OF RELIABILITY COEFFICIENTS OF 
UNEVENNESS SCORES FOR THE STANDARDIZED 
AND REVERSED GROUPS 





™ 














Subtest Standardized Group Reversed Group SE of Difference 
1 .62 53 
2 .29 .44 
3 .50 .54 
+ .43 .49 
5 .61 . 56 
6 .34 .36 
7 .51 .50 
av. z = .52264 av. 2 = .53737 .02598 
av. r = .480 av. r = .490 C.R. = .566 





crossing in the direction of the differences, which was hard to ac- 
count for, from the hypothesis set up. Also, the test showing the 
greatest amount of change was the one whose relative position was 
the same for both groups; the reliability of scores on the first and 
last subtests, on the average, does not seem to have been affected 
by the reversal. For these reasons it would seem that it was not 
the reversing which caused the changes, but other unaccounted 
for factors. Hence it may be concluded that the order of the sub- 
tests is not related to the shape of the profile. Whatever unrelia- 
bility there is in the profile is not due to the order of subtests. 
Does a battery of subtests all taken at the same time, i.e., at 
one sitting, yield the same profile that would be obtained if the 
several tests were taken at different times? A comparison of the 
results for the standardized group with those for the separated 
groups provides an answer. When the practice effects of these two 
groups were compared none of the differences were found to be 
reliable. In addition, the two groups were practically the same in 
regard to range and standard deviation. Further evidence of the 
similarity in the shape of the profiles of these two groups was ob- 
tained when a comparison of their average D (unreliability) scores 
was made. The standard errors of the difference between these 
scores, computed on the basis of both raw score and sigma score 
data, far exceeded the obtained difference. 
A comparison of the reliabilities of subtest scores (trait meas- 
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ures) showed that while the average coefficient of the separated 
group was significantly lower than that for the standardized group, 
the direction of the differences was inconsistent. This, coupled 
with the fact that only two of the subtests were significantly lower 
in the separated group, led the writer to compare the separated with 
a larger group. Accordingly, the reversed group was combined with 
the standardized group since the two had been found not to be 
reliably different. Compared with this enlarged group the difference 
in average reliability for the separated group was not reliably lower, 
and again there was no consistent direction in the differences 
between the subtest coefficients for the two groups. Only one sub- 
test (No. 5) was significantly lower in the separated group than 
in the combined group. Finally, the reliabilities of the average U, 
and Us, scores of the separated and standardized groups showed 
no reliable differences; nor was there any consistency here in the 
direction of the differences between the reliabilities for the cor- 
responding subtests, as examination of Table IIT will show. 

To restate, the standardized and separated groups were similar 
with respect to the measurements described above. Hence it is 
concluded that the separation of the subtests (by the interval used) 
is not related to the shape of the profile. The unreliability of the 
profile is not the result of the interval between subtests. This find- 
ing may have practical implications for the user of tests: The ad- 
minstration of many tests requires a longer time than one class 
period. The tester need not feel hampered by the demands of such 


TABLE III.—CoMPARISON OF RELIABILITY COEFFICIENTS OF 
UNEVENNESS SCORES FOR THE STANDARDIZED 
AND SEPARATED GROUPS 





“\ 





Subtest Standardized Group Separated Group SE of Difference 
1 .62 .50 
2 .29 .38 
3 .50 51 
4 43 .33 
5 .61 57 
6 .34 .36 
7 51 .49 
av. z = .§22644 | av. z = .487912 .03051 
av. r = .480 av. r = .450 CR = 1.138 
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time restrictions but can separate the battery to fit the time 
schedule. 


RELATION BETWEEN UNRELIABILITY AND IQ 


A third purpose of this study was to investigate the cause of 
individual differences found in unevenness. These data were first 
examined for the presence of the inverse relationship between IQ 
and unevenness which was reported by Tilton.’® The findings were 
adequate to justify a search for any possible relation between un- 
reliability and IQ. 

The reader is remined of an earlier statement regarding the use 
in this study of a discrepancy score (D score) to measure the un- 
reliability of each pupil’s performance. For the total sample the 
reliability of the discrepancy scores was .08 with a SE of .028. 
(When this r is stepped up by the Spearman-Brown formula it 
becomes .15.) While anr of this size indicates a very low reliability 
without practical significance, it more than passes the one per 
cent level of statistical significance.” There is therefore a slight 
tendency for some individuals to be more unreliable than others. 
Are these more unreliable pupils the dull pupils? 

The D scores for the dull, average and bright groups were as 
shown in Table IV. 

In determining the probability that the dull and average will 
always exceed the bright in unreliability, only the differences in 
one direction are involved: that is, (1) excess of unreliability of the 
dull over the bright, and (2) excess of unreliability of the average 
over the bright. These differences are reported in Table V. A differ- 
ence between the dull and bright of this magnitude (2.47) and in 
this direction passes the one per cent test. In the same way the 
difference between the average and the bright satisfies the two 
per cent test, and the difference between the dull and average 
combined, and the bright more than satisfies the one per cent test. 
And the findings are very similar when the discrepancy scores were 
computed from raw score data. Evidently the bright are more re- 
liable than the dull and average. It has been shown therefore that 
the dull and average are not only more uneven, but are also more 
unreliable; and that the bright are not only less uneven, but also 





19 J. W. Tilton, loc. cit. 
20 EK. F. Lindquist, Statistical Analysis in Educational Research, Boston: 


Houghton Mifflin Co., 1940, p. 212. 
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TaBLE IV.—Discrepancy Scores FoR DirFreRENT IQ 
Groups (SIGMA SCORE DATA) 








IQ below 95 IQ 95-104 IQ above 104 
M 63.54 (372) 63.19 (421) 59.63 (345) 
SD 21.56 20.91 » 20.80 
6M 1.12 1.02 1.12 














TABLE V.—COMPARISON OF THE DIFFERENCES IN DISCREPANCY 
Scores FOR DIFFERENT IQ Groups (SIGMA SCORE DATA) 








Difference _| Standard Error of CR 
Dull-average .35 1.51 .23 
Dull-bright 3.91 1.58 2.47 
Average-bright 3.56 1.51 2.36 
(Dull + average)-bright 3.72 1.35 2.76 














less unreliable. Hence, the negative relationship between uneven- 
ness and IQ which has been reported could be a function of indi- 
vidual differences in test-retest unreliability. 


THE RELATION BETWEEN UNRELIABILITY AND MENTAL AGE 


Stemming from the relation found between IQ and unreliability, 
a question arises as to the possible relation between unreliability 
and mental age. Using raw score data the average discrepancy 
scores were 13.48, 12.95 and 13.13 for the twelve, thirteen and four- 
teen mental age level, respectively. Although these scores tend to 
decrease with advancing mental age, they are not consistent in this 
direction; and the differences are not significant (critical ratios of 
1.71 and 1.06, respectively). Hence these data provide no basis 
for attributing to unreliability the relation which Woodrow” found 
between increased unevenness and advancing mental age. 

Similar comparisons were made using the D scores derived from 
sigma scores. Here, however, IQ has been held constant lest the 
inverse relationship between IQ and unreliability, which has been 
found in the data, should complicate the findings. For the average 
IQ group these discrepancy scores are 65.01, 62.32 and 62.67 for 
the twelve, thirteen, and fourteen mental age levels, respectively. 





21 Herbert Woodrow, op. cit., p. 294. 
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These D scores also decrease with advancing mental age and in a 
pattern similar to that of the other D scores. ‘The slight deviation 
from the general trend of the fourteen mental age level is not sig- 
nificant. In this case, also, the differences are not significant; criti- 
cal ratios are 1.01 and .91, respectively. Accordingly, there is no 
basis in these data for thinking that the relationship which Wood- 
row reported was due to individual differences. In fact, the differ- 
ences here are in a direction opposite to that which would be neces- 
sary to explain Woodrow’s findings on the basis of unreliability. 


RELATION BETWEEN UNRELIABILITY AND ACCELERATION IN SCHOOL 


Reference was made in the Introduction to a direct relationship 
found by Tilton”? between unevenness and acceleration in school: 
the accelerated for their mental age being more uneven than the 
at-grade group, and the retarded less uneven than the at-grade. 
Could it be that the greater unevenness of the accelerated is due 
to a greater unreliability on their part? Since unevenness has been 
found to be a function of IQ, the relation between unevenness and 
acceleration in school should be studied by holding IQ and mental 
age constant. For this reason IQ was held constant between 95 and 
104. This average IQ group (95-104) of each of the three mental 
age levels of the sample was divided into the retarded, at-grade 
and accelerated on the basis of their grade placement in school. 
On the basis of normal progress through school, those in the twelve 
mental age group were considered at-grade if they were in the 
seventh grade and accelerated if in the eighth or ninth; those in 
the thirteen mental age group were considered retarded if they 
were in the seventh grade, at-grade if in the eighth and accelerated 
if in the ninth; and in the fourteen mental age group, retarded if 
they were in the seventh or eighth grades and at-grade if in the 
ninth. It is obvious from the foregoing that only the thirteen 
mental age group here contained all three classifications. Because 
of the incompleteness of the twelve and fourteen mental age 
groups for this classification, with the resulting small number of 
cases in the retarded and accelerated groups; and because no con- 
sistent pattern of unevenness was revealed for mental age levels; 
all three were thrown together for analysis. The mean unevenness 
score for the average group (95 to 104 IQ) was 80.38 for the re- 
tarded group (N = 60), 83.11 for the at-grade group (N = 287) 





22 J. W. Tilton, unpublished manuscript, 1948. 
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and 83.92 for the accelerated group (N = 74). While the differences 
were insignificant, they were in the same direction as reported by 
Tilton.” Here, as in the relation between unevenness and IQ, these 
data could not be expected to show as pronounced differences as 
Tilton’s since they include only a three-grade range, while his 
data covered a five-grade one. In view of the foregoing, the data 
were considered suitable for an investigation of the relationship 
between unreliability and acceleration in school. It will be re- 
membered that the other step preliminary to this investigation— 
namely, that the unreliability scores are statistically reliable—has 
been demonstrated in a previous section. 

The same average IQ group (421 cases) was examined for un- 
reliability with these results: retarded 13.58; at grade 13.37; ac- 
celerated 13.04. 

The differences between the three groups were so small as to be 
of no statistical significance. (Critical ratios of .71, .87, and .60) 
It should be noted, too, that the trend of the differences in the reli- 
ability of these school groups is just opposite to that which one 
must find, in order to explain the direct relation which Tilton re- 
ported between unevenness and acceleration in school according to 
mental age. 

Another comparison of the unreliability of the three school 
groups was made on the basis of the D scores computed from sigma 
scores, with these results: retarded 61.00; at grade 63.64; accel- 
erated 61.93. Here, also, the differences were not significant (the 
standard errors being in each case larger than the obtained differ- 
ences). In these data there is, then, no evidence that the relation- 
ship between unevenness and acceleration may be caused by 
unreliability. In other words, the greater unevenness in the profiles 
of the accelerated is not the result of their greater unreliability. 


SUMMARY 


The data seem to justify the following conclusions, within the 
limits of the test used and subject to the conditions of the investi- 
gation: 

1) There is dependable evidence that much of the unevenness 
of the profiles is due to test unreliability. In the most reliably tested 
areas, only about twenty to twenty-five per cent of the time are 
the trait differential scores or unevenness scores reliable; and in the 
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least reliably tested areas only about five per cent of the time are 
they reliable. These findings have implications for the guidance 
worker who should be cautious about considering all high and low 
differential scores as evidence of a special ability or disability. 

2) The relation of the unreliability of profiles to test administra- 
tion was negligible. Profiles were not changed significantly by the 
two variations of administration investigated. These findings may 
have implications for the makers and users of tests: a test battery 
may be broken up without fear of its effect on profiles. 

3) Unreliability itself was found to have a very low but statisti- 
cally significant reliability. 

4) The unreliability of profiles was found to be inversely related 
to IQ, in that the bright were found to be more reliable than the 
dull or average in taking group intelligence tests. From this finding 
it seems reasonable to conclude that the greater average uneven- 
ness of the dull is caused in part at least by their greater unrelia- 
bility. 

5) The data provide no basis for attributing the direct relation- 
ship between average unevenness of profile and mental age, to 
unreliability. 

6) And further, the data provide no basis for attributing the 
direct relationship between average unevenness of profile and ac- 
celeration in school for mental age to unreliability. 
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STUDENT-CENTERED VERSUS INSTRUCTOR- 
CENTERED INSTRUCTION’ 


W. J. McKEACHIE 


University of Michigan 


Even psychologists have their stereotypes. And for most of us 
‘student-centered’ and ‘instructor-centered’ are stereotypes. With 
‘student-centered’ we associate the halo terms of democratic, per- 
missive, insight, affective and student growth. ‘Instructor-centered’ 
brings to mind the terms authoritarian, Fascistic, knowledge for its 
own sake, and content-centered. In our psychological subculture 
the mere labels in our title stack the deck against anyone who at- 
tempts to defend the instructor-centered point of view. 

Despite our preconceptions, there are important areas of agree- 
ment; e.g., everyone agrees that we’re teaching students and that 
our job is to promote student learning. Also, almost everyone agrees 
that we want to improve our students’ problem-solving skills. This 
paper aims to do three things: 

1) To break up our stereotypes by examining some of the di- 
mensions which may differentiate student-centered from instruc- 
tor-centered instruction. 

2) To survey a group of research studies on the problem. 

3) To do some theorizing about the problem. 


DIMENSIONS OF DIFFERENCE 


First, what are the dimensions of difference? 


GOALS . 


One of the most prominent is the dimension of goal setting. The 
instructor-centered teacher believes that he is ultimately respon- 
sible for determining goals. To quote the report of a study group 
which met at Cornell last year, “If the teacher merits the respon- 
sibility placed in his hands, he knows more than do the students 
about the subject, about the world in which we live, and of the 





1 This paper was presented in an APA symposium, “‘Student-centered 
versus Instructor-centered Instruction,’’ sponsored by Divisions 2 and 15 
with Dr. Percival Symonds as chairman, Sept. 2, 1952. 
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ways in which a knowledge of psychology can enrich the world” 
(8). The student-centered instructor, on the other hand, believes 
that the group, including both students and instructor, should 
determine the group goals. . 

Despite his greater emphasis upon the student in goal-setting, 
the student-centered instructor usually has certain implicit goals 
which he hopes will be achieved by the students. Thus another area 
of difference between instructor-centered and student-centered 
teaching is in the type of goals for which they are aiming. This area 
of goals needs to be plotted in at least two dimensional space be- 
cause there are three differing types of goals: intellectual, applied, 
and affective. 

One approach emphasizes the traditional intellectual goals of a 
liberal education. It attempts to create an interest in ‘Knowledge 
for its own sake’. The primary goal is to teach students to think. 
The instructor with this approach may be interested in attitudes, 
but they are the attitudes of the scientist toward his subject mat- 
ter, not social attitudes. These are the goals which we usually as- 
sociate with the instructor-centered method, but which may be 
independent of other dimensions of the methods. 

The instructor who is interested in applied goals feels that psy- 
chology can offer much that is useful in the student’s daily life, 
not only in adult life but in college as well. He is interested in 
teaching the facts of psychology only insofar as they can be applied 
by the student. Specific skills such as reading skills or study habits 
are apt to be part of the content of his course. 

The instructor primarily interested in affective goals is likely 
to disavow any suggestion that his class is simply group therapy. 
In spite of this, he is even more dissatisfied than most teachers with 
achievement tests as a criterion of his teaching, and he is quite un- 
happy if his students don’t have some adjustment problems that 
their psychology course can cure. 

Student-centered instruction tends to emphasize the last of these 
three goal-emphases, but none of these goal-emphases are insolubly 
linked with other dimensions of teaching. Benjamin Bloom, for 
example, has been much interested in the discussion method as a 
technique for attaining intellectual goals. 


METHODS OF TEACHING 


Probably most of us think of student-centered instruction as 
differing from instructor-centered teaching primarily in terms of 
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what goes on in the classroom. One of the dimensions here is degree 
of student participation. The student-centered instructor tends to 
encourage a high degree of student verbal participation. Most stu- 
dent-centered instructors, however, are not satisfied if student dis- 
cussion is directed at the instructor. They are interested in 
developing a high degree of inter-student participation. The stu- 
dent-centered instructor feels that students who talk to him are 
maintaining their dependence upon him, and this is in conflict with 
his usual goal of independence. 

Another dimension of classroom behavior is the degree of in- 
structor acceptance of erroneous or irrelevant student contribu- 
tions. Some student-centered instructors emphasize the importance 
of accepting all contributions without evaluation, or at least with- 
out negative evaluation. 

A third dimension of classroom climate is degree of group co- 
hesiveness. Typically the student-centered instructor attempts to 
create a group with a high degree of cohesiveness. He may attempt 
to measure this by counting the number of ‘We’s’ as contrasted 
with the number of ‘I’s’ verbalized in class discussion. 

A further dimension upon which student-centered and instruc- 
tor-centered classes differ is in the degree to which the student feels 
he can influence his own fate. Perhaps it is unnecessary to elaborate 
upon this point, but it may be pointed out that what have been 
called student-centered classes vary from classes in which the 
instructor lays out a course outline, makes assignments, and ac- 
tively guides discussion, to classes in which almost all course plan- 
ning is done by group decisions and the instructor begins class 
with, ‘What would you like to talk about today?” 

The amount of class time devoted to personal experiences and 
problems of the students may be another dimension. The instruc- 
tor who is concerned with the development of self-insight is more 
apt to permit or encourage discussion of personal problems. 

Obviously one could list many other dimensions upon which in- 
structor-centered and student-centered teaching may differ. Those 
listed are simply some which have been most frequently mentioned 
in the literature. When someone says he has used student-centered 
teaching he usually means that as compared with instructor-cen- 
tered teaching, his class has had a higher degree of one or more of 
these qualities: 

1) Student participation in goal setting. 

2) Emphasis upon affective goals. 
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3) Student participation and student interaction. 

4) Instructor acceptance of inaccurate statements. 

5) Group cohesiveness. 

6) Ability to determine its own fate. 

7) Amount of time devoted to discussing personal experiences 
and problems. 


EXPERIMENTAL RESULTS 


One would expect that the controversy between our education’s 
authoritarians and permissivists would long ago have been resolved 
by the cold logic of experimental studies. Unfortunately, this just 
hasn’t happened. The published experimental studies are not in 
agreement and there are a host of unpublished studies which re- 
main unpublished because the two methods used produced no 
significant differences in outcomes. 

By way of some examples: One of the best known comparisons of 
student-centered and instructor-centered instruction is that of Faw 
(2). Faw’s class of one hundred and two students met two hours a 
week to listen to lectures and two hours a week in discussion 
groups of thirty-four. One of the discussion groups was taught by 
a student-centered method, one by an instructor-centered method, 
and one group alternated between the two methods. 

As compared with the instructor-centered class the student- 
centered class was characterized by more student participation, no 
instructor correction of inaccurate statements, lack of direction, 
and more discussion of ideas related to personal experiences. 

Surprisingly enough, Faw’s major measure of attainment of 
objectives was in the intellectual area. Scores on course examina- 
tions showed small but significant differences favoring the student- 
centered method. In the area of his major interests—emotional 
growth—Faw’s method of evaluation was to ask students in the 
student-centered and alternating classes to write anonymous com- 
ments about the class. Generally these comments seemed to indi- 
cate that the students felt that they received greater social and 
emotional value from the student-centered discussion groups than 
they would have from an instructor-centered group. Despite the 
objective test results Faw’s students felt that they would have 
made greater intellectual gains in an instructor-centered class. 

Now compare Faw’s experiment with that of Asch (1). Asch, 
like Faw, taught all of the groups involved in his experiment. 
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Three sections of about thirty to thirty-five students were taught 
by instructor-centered methods; one section of twenty-three stu- 
dents was taught by a student-centered method, quite similar to 
that of Faw. However, there were certain differences between Faw’s 
and Asch’s experiments. In Faw’s experiment both student-cen- 
tered and instructor-centered classes spent two hours a week lis- 
tening to lectures. In Asch’s experiment, only the instructor-cen- 
tered classes wee subjected to lectures. While Faw doesn’t mention 
grading, one assumes that grades were determined by the instruc- 
tor on the basis of the course-wide examination. In Asch’s experi- 
ment students in the student-centered class were allowed to de- 
termine their own grades. 

The interesting thing is that Asch’s results do not completely 
agree with Faw’s. On the final examination in the course students 
in the instructor-centered classes scored significantly higher than 
members of the student-centered class, not only on the objective 
portion of the test but also on an essay portion. Note, however, 
that the student-centered class was specifically told that the exami- 
nation would in no way affect their grades in the course so that 
these differences may be simply due to a difference in motivation. 
As measured by the Bogardus Social Distance scale, attitude 
change in the two sections was not significantly different. However, 
as compared with the instructor-centered classes a greater percent- 
age of members of the student-centered class improved in adjust- 
ment as measured by the MMPI. 

Interestingly enough Asch’s students, like Faw’s, had a different 
perception of their achievement than that shown by the course 
examination. Faw’s student-centered class did better on the course 
examination than the instructor-centered section, but thought they 
would have learned more if they had been in an instructor-cen- 
tered class. Asch’s students rated the student-centered class higher 
than instructor-centered in helping them to learn the subject matter 
of the course but they actually scored lower. There seems to be 
some irony in the fact that advocates of student-centered methods 
find that students’ perceptions of group achievement is erroneous; 
yet many of us want students to take a larger share of the responsi- 
bility for evaluation, and are pleased that they report great gains 
in personal and social values in student-centered classes. If groups 
which report greater intellectual gains actually learn less, it might 
seem logical to conclude that groups which report greater gains in 
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the areas of personal emotional growth may really be gaining less 
than those groups which report lesser gains. 

One of the most comprehensive experiments in this area is that 
of Landsman (5). He experimented with a student-centered type 
of teaching as contrasted with a more directed type of democractic 
discussion organized around a syllabus. His experimental design 
involved eight classes in a course sequence of Human Development, 
Adjustment, and Learning. Three instructors took part in the ex- 
periment and each instructor used both methods. Outcome meas- 
ures included the Horrock-Troyer tests, a local case history analysis 
test, Group Rorschach, MMPI, autobiographies, and students’ re- 
actions. His results showed no significant differences between meth- 


ods. 


A REDEFINITION OF THE PROBLEM 


What are we to conclude from these studies? While there is a 
dearth of follow-up data, with such slender results at the end of the 
courses our hope that either method produces significantly greater 
long-time benefits is probably unrealistic. Should everyone go his 
own way and teach any way he pleases? Personally, we are not 
willing to go quite so far, but certainly none of us should exclaim 
with horror, ‘His classes are instructor-centered!”’ 

As psychologists, however, we believe in research. Why has re- 
search on student-centered versus instructor-centered teaching 
seemed to lead up a blind alley? 

One reason suggested for contradictory results is that different 
people have meant different things by student-centered. But a far 
more important reason is that we’ve been lumping together more 
variables than we could handle with our experimental designs. 
The first part of this paper was concerned about defining dimen- 
sions of difference because we need to work with a more limited 
number of variables and we need to relate these variables to the 
main body of psychological theory. 

Do we have any assurance that in the welter of student-centered 
versus instructor-centered research there are any variables that 
make a difference? Despite a somewhat pessimistic view of most 
research in this area, there are enough glimmerings of hope to 
justify further research. 

For example, Smith and Johnson (7) have found that student- 
centered teaching produces higher scores than instructor-centered 
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teaching on tests of reasoning ability and creativity. Furthermore, 
all research on the problem seems to agree that as compared with 
instructor-centered teaching, student-centered teaching results in 
little decrement to the learning of facts (providing the classes have 
textbooks and tests are based on the texts). 

In addition Asch and Faw are not alone in feeling that adjust- 
ment and social skills are improved by student-centered methods. 
The research of Gibb and Gibb (3) indicates that students from 
group-centered classes which possessed many of the characteristics 
ordinarily called ‘student-centered’ actually produced a growth in 
social skills in experimental situations outside the classroom. Kelley 
and Pepitone (4) found an increase of empathy in classes which 
had been taught by student-centered methods. Efforts to produce 
such gains in instructor-centered classes have been unsuccessful. 

Obviously, we cannot test every possible interrelationship of the 
variables involved in student-centered classes, but many can be 
studied independently of the total method. For example, the vari- 
able of degree of student verbal participation in the classroom could 
be divorced from goals, cohesion, and other variables, and concep- 
tualized as providing an opportunity of student responses to be 
rewarded or punished. We might gain some insight into student- 
centered teaching by varying opportunities for verbal participation 
or by varying the percentage of comments which the instructor 
rewards or corrects. 

Another variable upon which we already have data is the degree 
to which the student feels able to influence his own fate. It seems 
significant that almost every one who tries student-centered teach- 
ing methods finds that the problem of grades presents the greatest 
obstacle to success. In addition two studies (3) (7) which produce 
evidence of change as a result of student-centered teaching both 
reduced the power of the instructor by giving the group responsi- 
bility for determining grades. 

Our own research at the University of Michigan supports our 
notion that the student’s feeling of freedom is an important vari- 
able. In an earlier paper (6), the writer reported that students made 
higher scores on examinations when they were given the oppor- 
tunity to write comments about test items. More recent experi- 
mentation indicates that the effect of this opportunity is reduced 
if the student is directed to write comments. Apparently the im- 
portant thing is that the student feel free to do so if he wishes to. 
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In addition, it was found in another study that students pre- 
ferred a directive method of teaching which made clear what the 
student had to do in order to pass the course. Again, we would in- 
terpret this as meaning that the student felt better able to deter- 
mine his fate. 

Looked at in this way, the class in which the instructor insists 
on being non-directive may actually increase the student’s feeling 
of helplessness since he doesn’t know what to do in order to achieve 
the goal of a good grade in the course. In this situation the student 
may simply perceive the instructor as using his power to block the 
normal pathways to the goal. 

From this theory one would predict that the effect of instructor 
permissiveness would depend upon whether or not the group pos- 
sessed the skills necessary to achieve their goals. In a new group 
the effect of instructor permissiveness may depend upon the pres- 
ence or absence of individuals in the group who have had previous 
experience in working in democratic groups. If the instructor re- 
tains control of rewards, permissiveness with respect to means to 
the goal (such as assignments, classroom activities, etc.) may sim- 
ply increase the ambiguity of the situation for the student and 
reduce student learning. 

Obviously such speculations need to be tested. To make our 
research productive we need to sharpen both our theories and our 
methodology. 
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AN IMPROVED FORMULA FOR SCORING 
CERTAIN GUESS-WHO RATINGS AT 
THE ADOLESCENT LEVEL 


EVAN R. KEISLAR 


University of California at Los Angeles 


The guess-who technique is a device whereby bi-polar trait 
ratings of a group of children may be obtained by using the mem- 
bers of this same group as the raters. It appears to be one of the 
few promising tools which yield information regarding the way 
each member of a large adolescent group is perceived by this group. 
But different assumptions and scoring methods have been used by 
investigators with the result that some ambiguity exists regarding 
the interpretation of scores yielded by this device. It was the aim 
of this study to develop a new scoring formula and to evaluate it 
in comparison with the two formulas previously applied when the 
guess-who technique is used for certain types of traits at the 
adolescent level. 


SCORING METHODS OF GUESS-WHO RATINGS 


In the usual form of this device, one or more pairs of statements 
are used for each trait, one statement of each pair representing 
behavior falling on what is arbitrarily designated as the positive 
side of the trait continuum, the other statement on the negative 
side. Raters are asked to indicate those members of the group who 
are judged to exhibit the behavior described by each statement. 
The CEI (2, pp. 221-36) used the following scoring method, re- 
ferred to as Formula A, for such ratings: 

Formula A.—Trait score = (No. of positive mentions) — (No. 
of negative mentions). 

When the guess-who ratings are obtained from adolescents, 
where the peer group is of necessity large, there is a wide varia- 
tion in how well different members are known by the raters. Con- 
sequently, Formula A may give excessively extreme scores for 
individuals who are widely known and scores which are probably 
too close to zero for those who are not. In most cases this factor of 
being widely known is not related to the trait in question. 

To avoid this deficiency Havighurst and Taba (3, p. 215) used 


151 


Oe Se Pile 





152 The Journal of Educational Psychology 


what will be referred to as Formula B: 
Formula B.—Trait score = 


No. of positive mentions | x 100 
Total No. of pos. and neg. mentions on the trait 





But this formula appears to overcompensate, since it assigns 
the maximum score to each individual who receives only positive 
mentions regardless of their number. Furthermore, it requires 
discarding all cases which receive no mentions on the trait; many 
such persons may be well known but their behavior may be judged 
consistently by all raters to fall between the two statements. 

Faced with such problems of scoring, Kuhlen and Lee (4) de- 
vised what amounted to two categories, one containing all scores 
with a greater number of positive mentions and the other consist- 
ing of all remaining scores. Bonney (/) translated the guess-who 
ratings to an arbitrary five-point scale. But such methods offer no 
adequate solution. 

To attack this problem of scoring guess-who ratings obtained 
from large populations, let us assume that each pair of statements 
for a trait represents behavior falling beyond two points at ap- 
proximately equal distances in opposite directions from the mean 
on the trait continuum. We may then assume that the number of 
mentions a person receives on a statement is equal to the product 
of two variables: (1) the amount of deviation shown by the indi- 
vidual on the trait in the direction indicated by the statement, and 
(2) a measure of how widely known the person is, that is, the num- 
ber of raters who are acquainted with him. As an estimate of this 
second variable one may use the number of times an individual is 
mentioned by the raters on a wide variety of unrelated traits. By 
algebraically adding such deviations, after transposition, we ob- 


tain: 
Formula C.—Trait score = 


(No. of pos. mentions) — (No. of neg. mentions) 100 
Total No. of mentions on a variety of traits 





This new formula, Formula C, is applicable only to traits rated 
on an instrument such as Tryon (4) developed in which a variety 
of trait ratings are obtained. It essentially yields trait scores which 
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would be expected if all individuals being rated were mentioned 
the same number of times on the entire set of traits. It would ap- 
pear to be an improvement over Formula A, since it weights the 
number of mentions in inverse proportion to how widely known an 
individual is. It gives a score, as Formula B does not, to an indi- 
vidual who is not mentioned on the trait provided he is mentioned 
on others. An empirical comparison of Formula C with each of 
these other two formulas is therefore in order. 


STATEMENT OF PROBLEM 


It was the purpose of the present study to compare the pro- 
posed Formula C with Formula A and with Formula B, as pre- 
viously defined, on two traits judged to be independent of this 
factor of being widely known, when the guess-who technique is 
used with secondary-school pupils. The bases of comparison were: 
(1) form of distribution yielded, (2) reliability of scores, and (3) 
validity of scores as measured by an independent criterion. 


SUBJECTS 


Guess-who ratings were obtained from one hundred and ninety- 
eight girls and one hundred and ninety-six boys out of the four 
hundred and thirty-six members of the B11 class in a senior high 
school located in a largely residential section of Los Angeles. These 
pupils were asked to rate the members of this class who were of 
their own sex. The group for whom ratings were scored consisted 
of the one hundred and seventy girls and one hundred and eighty- 
seven boys who had been enrolled a full school year. 


THE GUESS-WHO DEVICE USED 


Two forms of the guess-who rating blank were constructed, one 
for each sex. One pair of statements was used for each of twelve 
traits many of which were taken without major change from Tryon 
(5). The traits may be described by indicating, in the order in 
which they were listed on the rating blank, the opposing statements 
as follows: talkative—silent; old acting—young acting; friendly— 
unfriendly; likes school work—dislikes school work; considerate— 
inconsiderate; popular—unpopular (with opposite sex) ; not persist- 
ent—persistent; welcomed—ignored (by same sex); puts studies 
last—puts studies first; conceited—not conceited; cheerful—sad; 
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influential—not influential (girls form only); athletically compe- 
tent—athletically incompetent (boys form only). Although, for the 
requirements of Formula C, traits could have been selected for the 
instrument which showed greater independence of each other, the 
variety appeared adequate. 


TRAITS SELECTED FOR SPECIAL STUDY 


Comparative data for the three formulas were obtained from the 
two traits ‘likes schoolwork’ and ‘puts studies first’ which are thus 
indicated by using only the statement assigned positive values. 
The wording of the statements for these two traits on the boys 
form, with appropriate changes for the girls in parentheses, was as 
follows: 

Here is someone who likes schoolwork; he (she) usually enjoys 
what he (she) does in his (her) classes: 

Here is someone who dislikes schoolwork; he (she) usually detests 
what he (she) does in his (her) classes: 

Here is someone who will not do other things until he (she) 
finishes his (her) studying: 

Here is someone who always spends his (her) time on other 
things even though he (she) has studying to do: 


ADMINISTRATION AND SCORING PROCEDURE 


In giving their ratings pupils were instructed not to sign their 
names and to indicate by means of pre-assigned code numbers 
those individuals they wished to nominate for each statement. 
The number of self-mentions, discovered by an invisible fluorescent 
ink device, was so small that they were not excluded in the scoring. 
The number of mentions received by the one hundred and seventy 
girls in the study on all twelve traits ranged from 5 to 375 with a 
mean of 68 and a standard deviation of 48. For the one hundred 
and eight-seven boys these scores ranged from 7 to 379 with a mean 
of 59 and a standard deviation of 45. 

No Formula C scores were given to those pupils who received 
less than twenty-four mentions for the entire set of twelve traits. 
No Formula B scores were given to those pupils who received less 
than two mentions for the trait in question. Comparisons between 
Formula C and each of the other two scoring methods were made 
on the population of those individuals who received scores from 


both formulas being compared. 




















Scoring Guess-Who Ratings 155 


RESULTS 


Four distributions, one for each of the two traits described earlier 
for boys and girls separately, were obtained for each formula. The 
means for Formula A and Formula C ranged from —1 to +1, with 
standard deviations ranging from 5 to 11. For Formula B, the 
means ranged from 43 to 56 with all standard deviations approxi- 
mately 40. Correlations between Formula A and Formula C scores 
ranged from .77 to .82; those between Formula B and Formula C 
scores ranged from .84 to .89. Since these scores were obtained from 
the same set of ratings, it is obvious that the formulas are yielding 
somewhat different results. 

1) Form of distribution.—All four of the distributions for Form- 
ula A were highly leptokurtic. On the basis of the chi-square test, 
the hypothesis of normality could be rejected for each of these far 
beyond the .001 level. On the other hand all four of the distribu- 
tions for Formula B were definitely bi-modal. Since about half of 
the scores were located at the extremes no test of normality was 
made. Although Havighurst and Taba used several pairs of state- 
ments for each trait, the bi-modal distributions usually found for 
this formula were taken as a sign of the reliability of the trait 
ratings (3, p. 217). 

For Formula C, however, the distributions approximated nor- 
mality. In three of these distributions the hypothesis of normality 
could not be rejected at the .05 level. For the fourth—liking of 
schoolwork by boys—the hypothesis could not be rejected at the 
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Fic. 3. Distribution of Formula C Scores on ‘Likes Schoolwork’ for 165 
Boys. 


.01 level. This last distribution is shown in Fig. 3 with the corre- 
sponding distributions from the other two formulas in Fig. 1 and 
Fig. 2. We may conclude that Formula C is the only one of the 
three methods of scoring which yielded distributions approximating 
normality on the two traits selected for study. Tests on most of 
the remaining ten traits in this instrument showed in every case 
that the new formula produced normal distributions. 
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2) Reliability—The split-half reliabilities for the two traits for 
Formula A were .95 and .85 for the girls and .95 and .92 for the 
boys. But it was not possible to use the split-half method with 
Formulas B and C since the total scores are not merely the sums 
of the scores obtained from two groups of raters. Instead, since 
correlation coefficients between the two traits on Formula A cor- 
rected for attenuation were .98 for the girls and .99 for the boys, 
it was assumed the two traits were sufficiently similar to be looked 
upon as two forms of a rating device each of which was measuring 
approach-avoidance behavior with respect to schoolwork. Hence 
the correlations between the two traits were taken as estimates of 
the reliabilities offered by the different scoring methods. Such a 
procedure gives slightly spurious results for Formula C because 
the two sets of scores constitute indexes with the same denominator 
term. However, when the denominator (total number of mentions) 
was partialled out none of the correlations were reduced by more 
than .03. 

In Table I the reliability coefficients for each of the two form- 
ulas being compared were obtained from identical populations as 


TABLE I[.—COMPARISONS OF RELIABILITIES OF ScorRES YIELDED BY 
THREE FormMuLAS. BASED ON CORRELATIONS BETWEEN Two TRaITs, 
“LIKES SCHOOLWORK”? AND “Puts Srupies First’? ror Boys AND FOR 
GIRLS SEPARATELY 


' 
| Diff. bet. 























Sex | Formula | N r | correspond- op* D/ep 
ing z’s 
een Se sania | 
Boys | A | 165 93 | .52 11 | 64.67** 
| C | 165 .73 
B 113 74 
C 113 . 80 14 .14 1.04 
Girls A | 161 . 86 .38 ll 3.41** 
C 161 .73 
| B | 110 68 
| C | 110 79 . 26 14 1.90 











Norte: All figures reported have been rounded off after computation to 
two decimal places. 

* Computed without considering a correlational term. 

** Significant at .01 level 
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indicated by identical N’s. The reliabilities given by Formula A are 
higher, both differences being significant at the .01 level, than 
those given by Formula C. This may be attributed in part to the 
highly leptokurtic nature of the distributions yielded by the first 
formula, thus allowing a few scores to contribute disproportionately 
to the covariance. Formula C gives estimated reliabilities which are 


TABLE II.—CoMPARISONS OF CORRELATIONS WITH SCHOOL GRADES 
AND Scores YIELDED BY THREE FORMULAS. BASED ON Two 
TRAITS FOR Boys AND FOR GIRLS SEPARATELY 




















| 
; Scored ; | th Diff. bet. 
Trait Sex ss N ee oo Oe D*/ep 
Likes schoolwork Boys A 165 .61 
C 165 71 .18 2.78** 
B 140 .74 
C 140 7 .05 wa 
Girls A 161 61 | 
C 161 74 .25 | 3.77** 
| 
B 133 68 | 
C 133 a .22 | 3.17%* 
| | 
Puts studies first Boys A | 165 | 57 | 
C 165 | .66 14 | 2.18* 
| 
B 126 | 63 | 
C 126 | .69 10 =| 1.75 
Girls A 161 .59 
C 161 68 16 | 2.66** 
B | 124 70 | 
C |} 124 74 09 | 1.52 

















Note: All figures reported have been rounded off after computation to 
two decimal places. 

® All values of op were either .06 or .07. These were computed from for- 
mula including the correlation term given by Quinn McNemar, Psychologi- 
cal Statistics, (John Wiley and Sons, New York, 1949), pp. 124-125. The 
eight values of r,, ranged from .66 to .79. 

* Significant at .05 level. 

** Significant at .01 level. 
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higher than those yielded by Formula B, but these differences are 
not significant on the basis of the test used. 

3) Validity.—The best validity criterion available for the two 
traits was grade point average during the pupils’ previous year in 
high school. Although studying and liking of schoolwork are pre- 
sumably not to be identified with good marks, there is evidence 
that the differences are not as large as might be expected. The 
partial correlations between each of the two traits, as scored by 
Formula C, and IQ, with grades held constant, are close to zero in- 
stead of being clearly negative. It would seem likely that the raters 
were highly influenced in their judgments by school marks pupils 
received or that teachers give grades partly on the basis of the 
interest shown by the student and completion of daily assign- 
ments, or perhaps both. 

Grades were changed to a numerical! scale where, for example, A 
= 9, D and F = O. The mean grade for the girls on this scale was 
4.7 with a standard deviation of 2.1. For boys the mean grade was 
4.0 with a standared deviation of 2.4. The correlations between 
grades and the scores on both traits for the three formulas are 
given in Table II. As before, identical N’s indicate identical popu- 
lations. 

The correlations for Formula C were significantly higher than 
those for Formula A in three cases. In the fourth the difference 
was significant at the .05 level. We may conclude that, if grades 
constitute an adequate criterion, Formula C yields higher validities 
for these trait ratings than does Formula A. Comparable data 
for Formulas B and C indicate that in each case Formula C yields 
higher validity coefficients, but only one of the differences is sig- 
nificant. 


DISCUSSION AND CONCLUSIONS 


1) Perhaps the most important finding in these comparisons is 
that the proposed formula was the only one of the three scoring 
methods which yielded normal distributions for these two traits. 
Although we have no evidence as to how the scores on a trait are 
distributed, normal distributions are to be preferred for practical 


reasons. 

2) The general superiority of Formula C over Formula A with 
respect to validity would support the assumption, for traits and 
populations similar to those used in this study, that it is advisable 
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to weight in some manner the number of mentions on a guess-who 
statement in inverse proportion to a measure of how widely known 
the individual is among the rating group. 

3) While the use of the total number of mentions on a variety 
of traits has given fairly satisfactory results as an estimate of ex- 
tent of being known among the raters, a superior measure of this 
variable might be obtained by having raters check off on a list 
those members of the group whom they know. This would allow 
an investigator to use fewer traits with several pairs of statements 


for each trait. 


SUMMARY 


A new method for scoring guess-who ratings was proposed for 
those situations at the adolescent level (1) where the population 
being rated is so large that there exists considerable variation in 
how widely known by the group the members are, and (2) where 
ratings are being obtained on bi-polar traits which are judged to be 
relatively independent of this factor of being widely known. When 
guess-who ratings were obtained from a large high-school class on 
two traits dealing with approach-avoidance behavior with respect 
to schoolwork, the proposed formula was judged to be superior to 
each of the other two scoring methods which have been previously 
used. The new method of scoring produced higher validity coeffi- 
cients with school grades as a criterion and was the only formula 
which yielded normal distributions. Some support was given to the 
assumptions made by the new formula. 
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A COMPARISON OF MENTAL ABILITIES 
OF BRIGHT AND DULL CHILDREN OF 
COMPARABLE MENTAL AGES’ 


OLIVER P. KOLSTOE 


University of Illinois 


The usefulness of the mental age as a unit of measurement de- 
pends upon several considerations. One is the extent to which 
children who achieve the same MA score are alike in their mental 
abilities. This investigation is concerned with one special case of 
this issue; namely, the question of whether or not there are sys- 
tematic differences in performance on intellectual tasks between 
bright and dull children who achieve like MA scores; i.e., like 
general averages. Do MA’s of 10, for example, achieved by bright 
seven-year-old children differ systematically from MA’s of 10 
achieved by dull thirteen-year-old children? Are the MA’s earned 
by the two groups the same MA? Do we need to specify the age of a 
child in order to interpret his MA score? 

Many investigators have tried to discover whether there are 
systematic qualitative differences between groups of children of 
varying chronological ages but of equal mental ages.) In most 
instances the groups have been matched for comparison on the 
basis of mental age scores obtained from the administration of the 
Stanford-Binet scale. The investigators then identified the items 
of the Stanford-Binet which discriminated between groups of young 
children and older children equated on the basis of mental age. 
The item analysis data thus obtained were then presented as 
evidence of differences in the mental characteristics of the two 
groups. 

This procedure is open to question for two reasons. In the first 
place, the criterion measures are not independent of those used for 
selecting and matching the groups. The requirement of equality 
of total mental age scores necessitates a balancing of differences 
between the two groups on the individual items of the test. Thus, 
superiority in one kind of task for one group necessitates superiority 





1 Based on the writer’s doctor’s dissertation, State University of lowa, 
1952, done under the direction of Professors A. N. Hieronymus and J. B. 
Stroud. 
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in another kind of task for the other group, since the total mental 
age scores for the two groups are equal. The fact that the selection 
and criterion measures are not independent makes it desirable that 
the differences observed in these studies be evaluated by a pro- 
cedure in which the sources of differences are independent of the 
measures used for selection and matching. 

In the second place, it is also questionable whether the matching 
procedure used in these former studies is defensible. When sub- 
jects are matched on the basis of obtained mental age scores, the 
estimated true mental age scores of the older, dull subjects are 
certain to exceed those of the younger, bright subjects, on the 
average. Thus, if groups are matched on obtained mental age 
scores, one would expect the low IQ group to obtain higher scores, 
on the average, than the high IQ group on intellectual tasks which 
are administered independently of the matching test. 

The present study differs from those previously concerned with 
the problem in that (1) the low and high IQ groups were matched 
on the basis of estimated true mental age scores obtained by re- 
gressing their obtained scores; (2) tests used in the selection of the 
sample are not used as criterion tests; (3) the criterion tests were 
administered independently of those used for selection and match- 
ing and were more extensive than those used by other investigators. 


RELATED RESEARCH 


One of the earliest investigations of differences between old (dull) 
and young (bright) children having comparable mental ages was 
reported by Merrill in 1924. She concluded that the retarded 
group tended to excel on those types of items on which maturity, 
life age, and experience were favorable to success. In 1929 Wallin, 
finding the dull superior to the bright in the tasks of Counting 
Backward, and Vocabulary, pointed to age and experience as 
determining factors in this success. Laycock and Stanley in 1942 
found the retarded group more successful in reproductive tasks, 
while the bright were superior in memory and reasoning. The same 
year, Rautman reported the bright superior to the dull in tasks 
involving manipulation. Rameseshan in 1949 found the bright 
superior on the Primary Mental Abilities of Verbal Meaning and 
Reasoning, whereas the dull were superior on Space. In examining 
the results of scores made on the California Tests of Mental Ma- 
turity, Unsicker in 1950 found the bright superior on delayed recall 








At ors dite? ut, staal: 








Ps bind it sittin: 





Comparison of Mental Abilities and Mental Ages 163 


and reasoning, and the dull superior in spacial relations and nu- 
merical reasoning. 

In the most extensive of the studies found dealing with the prob- 
lem, Thompson and Magaret (1947 and 1950) found normal 
superior to dull children of comparable MA’s in rote memory.{In 
the second article they reported some evidence to indicate a 
special manual ability associated with mental deficiency. 


PROCEDURE 


Sample.—Two groups of children of approximately equal mental 
age but markedly different in chronological age and IQ were used 
in the study. A ‘groups by levels’ design was employed in the 
analysis of results. The groups consisted of twenty-nine third- 
and fourth-grade children with estimated true Stanford-Binet 
IQ’s of 116 or above, and twenty-nine eighth- and ninth-grade 
pupils with estimated true IQ’s of 84 or below. Four six-months 
intervals of estimated true mental ages, ranging from ten years, 
seven months to twelve years, six months, constituted the levels. 

The two groups were selected from classes in one mid-western 
city. Scores on the California Tests of Mental Maturity, which had 
been given previously to the children in this school system, were 
used to select pupils who appeared likely to fall into the two groups 
of bright and dull. The Stanford-Binet scale was then administered 
to the pupils so selected. Final selection of the subjects in the two 
groups was made upon the basis of the Stanford-Binet scores. 

Estimated true scores used in constituting the two groups and 
MA levels within the groups were obtained by the use of standard 
regression equations. The reliability coefficients (Stanford-Binet 
Form L) used in the equations were: .982 for IQ’s below 70; .945 
for IQ’s 70 to 83; .912 for IQ’s 116 to 129; and .898 for IQ’s 130 
and above. Subjects in each group were assigned to levels on the 
basis of the estimated true mental age scores which resulted from 
the regression of the obtained scores. No further use was made 
of the Stanford-Binet results. 





2 A companion study by Bliesmer involving essentially the same samples 
as those used by the writer, was conducted to determine the nature of 
systematic differences in a number of reading abilities between bright and 
dull children of approximately equal mental age. Emory P. Bliesmer, A 
Comparison of Bright and Dull Children of Comparable Mental Ages with 
Respect to Various Reading Abiltties, Doctor’s Dissertation, State Uni- 
versity of Iowa, 1952, pp. 153. 
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While the samples were not selected so as to control the number 
of boys and girls in each group, the sexes are represented in ap- 
proximately equal numbers in each group. 

Following is a diagram of the design as employed in the in- 
vestigation: 

















Criterion Scores 
MA Level MA (Est’d. True) — iaiiaieias 
| Bright Group Dull Group 
I 12-1 to 12-6  n-7 n-7 
II 11-7 to 12-0 n-10 n-10 
III 11-1 to 11-6 n-7 n-7 
IV 10-7 to 11-0 n-5 n-5 








Measuring Instruments.—Used as criterion measures were the 
subtests of the Wechsler Intelligence Scale for Children, the Benton 
Test of Visual Retention (1945), an original Speed of Symbol 
Copying test, and the Chicago Tests of Primary Mental Abilities for 
ages eleven to seventeen. All of the criterion tests except the latter 
were administered in individual testing sessions. The Symbol 
Copying test was an adaptation of the WISC coding test, in which 
the subject was required only to copy the symbols presented in 
the top boxes into the bottom boxes. The tests were administered 
by four qualified examiners who worked independently. All of the 
tests were scored and rescored under the supervision of the writer. 
The results are reported in terms of simple raw scores. 

Analysis of Results. For each criterion variable, the hypothesis 
tested is that there is no difference between the means of the popu- 
lations of which the bright and dull groups are samples. The test 
of this hypotheses is the ratio of the mean square for groups to the 
mean square for within-cells.® 





’ This ratio is a valid test of the hypothesis of equal population means 
only if the following conditions are satisfied: (1) The subgroup samples at 
each level are drawn independently at random from corresponding sub- 
populations. (2) The distribution of criterion measures in each subpopula- 
tion is normal and all have the same variance. (3) The numbers of measures 
in corresponding subgroups are in the same proportion for all groups and 
this proportion is the same as that among the numbers in the corresponding 
levels in the population. 

There is no reason to question seriously the assumptions of randomness, 
normality, and homogeneity. The third condition listed above is definitely 
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A five per cent coefficient of risk was selected in advance of the 
analyses. Mean square ratios were considered significant if they 
equalled or exceeded the five per cent value in the F table. Through- 
out the investigation differences are reported as either significant or 
not significant at the five per cent level of confidence. 


RESULTS 


A summary bof the differences between means of the groups for 
the various tests is given in Table I. In most cases the differences 
are small and not statistically significant, although some are 
significant. 

The dull group was superior to the bright on the WISC subtests 
of Comprehension and Coding, on the Symbol Copying test scored 
for speed, and on the PMA subtest, Number. The bright group 
was superior to the dull on the WISC subtest Digits, and on the 
Symbol Copying test scored for accuracy. 

It is suggested that the superiority of the dull group on the PMA 
subtest Number was due to the fact that this group, being made 
up of eighth- and ninth-grade pupils, had received instruction in 
certain arithmetic operations, specifically multiplication, to which 
at least part of the bright children, who were still in the third 
and fourth grades, had not been introduced. Thus, the superior 
performance of the dull group appears to reflect no qualitative 
intellectual difference, but rather, a curricular one. The results 
support a hypothesis of no difference between the two groups in the 
primary mental abilities. 

Numerous investigators have pointed to a superiority of the dull 





not satisfied in the present study since there is no possible distribution of 
numbers of subjects from level to level which can simultaneously be repre- 
sentative proportionally of the mental age levels in the populations of 
bright and dull subjects from which the samples were drawn. This lack of 
representativeness is of consequence only in the presence of a groups by 
levels interaction. There was little or no apriori basis for suspecting inter- 
action (different group differences from level to level) in the present study, 
especially since each of the groups is fairly homogeneous with respect to 
age, mental age, and IQ. The significance of the groups by levels interaction 
was tested in each of the analyses by computing the ratio of the mean square 
for interaction to the mean square for within cells. In the event of a sig- 
nificant interaction the generalizing of the test of main effects (total-group 
differences) would be restricted to hypothetical populations showing the 
same proportions of subjects in the mental age levels as are found in the 
samples. None of the interactions were significant at the five per cent level. 
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TABLE I.—SuUMMARY OF DIFFERENCES BETWEEN THE Raw Score MEANS 
OF THE BRIGHT AND DuLL GROUPS FOR ALL CRITERION TESTS 


























MB Mp MB | - Mp F 
WISC 
Information 14.17 | 14.07) + .10 .04 
Comprehension 12.24 | 13.90) — 1.66 | 9.69* 
Arithmetic 9.31 | 10.21) — .90 | 3.94 
Similarities 11.55 | 12.03) — .48 .59 
Vocabulary 34.90 | 34.24) + 66 40 
Digits 9.93 9.00; + .93 | 8.85* 
Pict. Comp. 11.97 | 11.28) + .69 .93 
Pict. Arrange. 33.35 | 31.48) + 1.87 .99 
Blk. Design 22.48 | 23.07) — .59 .05 
Obj. Assemb. 21.52 | 22.48) — .96 .74 
Coding 36.10 | 48.55) — 12.45 | 30.63* 
Benton 12.28 11.83; + 45 .98 
Symbol Speedf 91.45 | 141.07) — 50.62 | 29.65* 
Symbol Accur. 91.65 | 87.28) + 5.37 | 9.54* 
PMA 
Number 15.39 | 44.12) — 28.73 | 32.99* 
Verbal 30.23 | 30.46) — .23 .O1 
Space 15.42 | 15.69) — .27 01 
Word Fluency 33.23 | 33.35) — .12 01 
Reasoning 14.23 | 16.65) — 2.42 | 1.92 
Memory 7.46 6.88) + .58 | .38 





* Significant at the 5% level of confidence. 
t Symbol Speed scores are complements of time scores. 


in tasks involving manual manipulation skills. The results of this 
investigation indicate a significant superiority of the dull group in 
performance on the WISC subtest Coding, and on the speed score 
of the Symbol Copying test. However, while it is true that the dull 
group did better than the bright on the speed part of the Symbol 
Copying test, when the test was scored for accuracy the bright 
group was superior. It is reasonable to believe that careful attention 
to neatness and accuracy, generally emphasized in the lower 
elementary grades, may have slowed down the speed of the bright 
group to a significant extent, thus leaving the relative superiority 
of the two groups on this subtest something of a question. 
Moreover, the performance of the two groups on the WISC 
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subtests Completion, Picture Arrangement, Block Design and 
Object Assembly was not significantly different. These tests go 
to make up the performance part of the WISC. It would be ex- 
pected that if the dull group were really superior in manual manipu- 
lation skill, they could be expected to be somewhat consistently 
superior to the bright group on tests which ostensibly require such 
skill. There are manual manipulative skills, performance on which 
correlate with ‘criteria of intelligence, and those performances on 
which they do not. It is suggested here that if the dull older children 
are superior to the bright younger children on certain manipulative 
tasks, that these tasks may not be good intelligence tests in the 
first place. 

The performance of the dull group on the WISC subtest Com- 
prehension was significantly superior to that of the bright. An 
examination of the kinds of questions which are included in this 
subtest suggests that successful answers depend primarily upon 
experience, (for example; Why is it better to build a house of 
brick than of wood?). 

The finding that the bright group was superior to the dull in 
their performance on the WISC subtest Digits supports the writings 
of other investigators who have reported a rote memory superiority 
for the bright groups. However, there is reason to doubt that they 
have any general superiority in memory ability since the small 
differences in favor of the bright group on the PMA subtest 
Memory and on the Benton test were not significant. 

The results of the present study support to a very considerable 
extent the generality of the mental age concept. No evidence was 
found to support the claims of some writers that the bright are 
superior in such mental tasks as vocabulary, reasoning,.and a 
general memory ability. Nor does the evidence from the study 
indicate superiority of the dull older children in performance tasks 
or manual manipulation. 

Certainly this investigation has failed to reveal anything very 
wrong with the use of the Stanford-Binet MA as a measure of level 
of mental development. 
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APPLICATIONAL TRANSFER AND INHIBITION’ 


DONALD E. P. SMITH 
University of Michigan 


Progress is being made in formulating the relationships between 
variables belonging to different areas of psychological investiga- 
tion (1, 2, 4, 6. 7). Miller and Dollard (2) have discussed the effect 
of repression upon ideation from the standpoint of learning theory. 
Others (1, 6) have reported data on the influence of affect on per- 
ceptual development. 

The relation with which the present research is concerned is the 
effect of certain kinds of feeling-tone in the classroom learning situ- 
ation upon ‘applicational transfer’ of learning. An achievement 
test was constructed for the purpose of measuring the student’s 
ability to use information for solving problems in life-like situa- 
tions. The position taken is that possession of information does not 
necessarily imply ability to use that information, but rather that 
such ability is partially dependent upon the amount and quality 
of emotional involvement in the learning situation and upon the 
personality structure of the learner. The extent to which products 
of past learning have been inhibited—one dimension of personality 
structure—is the subject of this paper. 

The specific hypothesis to be tested is that limited ability to 
apply known principles in problem-solving is, in part, a function 
of the extent to which previously learned words and ideas have 
been inhibited or blocked. Applicational transfer is defined as a 
process by which relevant concepts are brought to conscious ap- 
praisal through the operation of necessary and sufficient generali- 
zation in the attempt to solve a problem. Inhibition as used herein 
refers to interference with the associative reaction in the normal 
operation of a ‘cue-producing’ response (2: p. 322), with a conse- 
quent reduction in efficient ideation. The cue-producing response 
which is blocked is viewed as capable of arousing feelings of fear 





1A portion of the writer’s doctoral dissertation entitled Applicational 
Transfer: Its Nature, Measurement and Improvement sponsored by M. D. 
Glock, Cornell University. The writer is indebted to Professors J. E. Hoch- 
berg, E. L. Walker, Ronald Lippitt and E. Lowell Kelly for critical reading 
of the manuscript. 
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or anxiety as a result of previous learning.? Symptoms of inter- 
ference with the associative reaction, according to Rapaport (8: 
p. 9), are delay or absence of response, or repetition of the stimulus 
word in a word-association test. | 

The generalization process underlying applicational transfer is 
considered dependent for its sufficiency not only upon breadth of 
relevant meaningful experiences, but also upon relative absence of 
interference with necessary cue-producing responses. This view 
finds indirect support in Razran’s report of studies in semantic 
generalization: ‘. .. generalization [of conditioned stimuli—single 
words] to the other word categories [synonyms, antonyms, etc.] 
was the greater... the faster their reaction time in controlled as- 
sociation tests” (9: p. 355), whereas, as noted above, interference 
is characterized by delay or absence of reaction in a word associa- 
tion test. 

The word association test was selected as a method appropriate 
for the measurement of inhibition as defined. An index of applica- 
tional transfer ability was provided by the differential between 
two scores on an achievement test in general psychology, reported 
elsewhere (11). The test consists of two parts, one purporting to 
measure information, the other, application, with each having 
eighty multiple-choice questions. The application part is composed 
of five reading selections concerning mental hygiene, learning, 
marital adjustment, all in story or anecdotal form, and two reports 
of experiments, followed by questions requiring identification of a 
principle, example given, of an example with principle given, of 
facts crucial to the solution of a problem, and others. 

The problem is whether superior appliers and inferior appliers 
show a similar differentiation in respect to latency on a word asso- 
ciation test, i.e., do superior appliers show brief latency and in- 
ferior ones delayed latency as might be predicted from the fore- 
going analysis. 


SUBJECTS 


Ss were fifteen introductory psychology students, each assigned 
to one of two groups on the basis of having a differential between 
his score on the application section of an achievement test in 





2 Dollard and Miller refer to this interference as ‘repression’ (2: p. 322) 
but the process, viewed empirically, also appears to fit O. H. Mowrer’s 
formulation of the ‘anticipatory response’ (6: p. 131). 
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TABLE I.—CHARACTERISTICS OF SUPERIOR AND INFERIOR APPLIERS 








WB} word | meet’ | Applics- [Inf Diff 
orma- % 

Group N — Fluency | Test ‘den oy cate 

' (total) 

Superior i) 19.1 | 57.7 | 75.3 | 33.1 42.2 9.1 
Inferior 6 19.0 | 48.0 | 72.0 | 24.0 | 48.0 | 24.0 
Difference ‘ 1 9.7 3.3 9.1 5.8 14.9 
P — 24 — .05 .10 01 


























psychology and his score on the information section sufficient to 
place him one SD above or below the mean of the group differential. 
The ‘inferior appliers’ (N = 6) scored, at the mean, 24 points 
lower on the application test than on the information test. (See 
Table I) ‘Superior appliers’ (N = 9) scored, at the mean, 9.1 points 
lower on the application than on the information test. 

Despite the extreme differentials, there was no significant differ- 
ence between the groups in total score on the achievement test in 
psychology, in scores on the Weschsler-Bellevue information subtest, 
nor in scores on the Thurstone Primary Mental Abilities subtest 
of word-fluency. Those scores for the achievement test and for the 
information subtest are virtually identical. 


APPARATUS AND PROCEDURE 


The word association test has traditionally been used as an aid 
in determining inhibition (8: p. 35), conflict (3) or emotional dis- 
turbance (12: p. 363). Although qualitative analysis of responses 
is ordinarily used with individual cases, reaction time has yielded 
satisfactory results for groups (3, 10). One criterion of inhibition 
has been a latency greater than that of normal subjects. Gardner 
(3) reports a mean reaction time to ‘neutral’ words (e.g., tree, lamp, 
grass, salt) of 1.85 sec. for normal college students. 

A group of fifty-seven words was adapted from a list of ‘emo- 
tionally-toned’ and ‘neutral’ words, so termed on the basis of 
‘practically universal agreement among competent judges’, re- 
ported by Sharp (10: pp. 109-110). The words were chosen for 
their significance in several problem areas, viz., academic (study, 
exam), social (alone, friend), family (mother, father), and religious 
(sin, God). These were arranged with an attempt to alternate 
critical with neutral words, and were transcribed at 20 sec. intervals 
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on an Ampro Tape Recorder. The recording was utilized in order 
to increase the objectivity of the testing situation. Alternation 
with neutral words and use of intervals were introduced for the 
purpose of reducing the effect of possible perseveration due to 
emotional arousal. 

Ss, who were tested ‘blind’ in a standard situation, were in- 
structed to respond orally with the first word which came to mind 
after each stimulus word was heard, to respond to all words, and 
to attempt to keep the mind blank between words. Latency was 
recorded in fifths of a second by means of a stop-watch. 


RESULTS 


Mean seconds for each group are reported in Table II. The re- 
sults clearly indicate differences favoring the superior applier group. 
All differences are significant beyond either the .01 or .02 level of 
confidence. Areas yielding the largest differences are religious and 
social; those yielding smallest differences, neutral and family. 


DISCUSSION 


It was predicted that, if the process of applicational transfer is 
influenced by inhibition of concepts involved in past learning, 
‘superior appliers’ should show a lesser degree of inhibition, as 
determined by reaction time on a word association test, than should 
‘inferior appliers’. The predicted difference is clearly indicated. 

Individual results are congruent with group means with the ex- 
ception of one § in the inferior group who obtained mean latency 
below two seconds for all areas. Analysis of his test answer sheet 


TABLE I].—DIFFERENTIAL LATENCY OF SUPERIOR AND INFERIOR 
APPLIERS ON A Worp ASSOCIATION TEST 


(mean seconds) 

















Related Area 
Group N ae 

| Academic Social Family | Religious | Neutral Total 
Superior 9 1.66 1.84 1.25 1.44 1.56 1.57 
Inferior 6 2.86 3.35 2.18 3.34 2.34 2.62 
Difference 1.20 1.51 .93 1.90 .78 1.05 
t 2.93 3.26 4.90 3.43 2.87 3.73 
P .02 01 01 .O1 .02 01 
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revealed that he failed to attempt twenty-eight of the eighty appli- 
cation items because, he subsequently stated, he was overly cau- 
tious on that part. Had he guessed on all omitted items, his appli- 
cation score, by chance alone, might have been such that he would 
not have been selected as a subject. 

Rapaport (8: p. 17) suggests that a latency of four seconds be 
considered indicative of a disturbed response. Responses of in- 
ferior appliers include an average of 8.0 disturbed responses per 
subject whereas those of the superior group include an average of 
1.5 such responses. 

Alternative explanations may be offered. First, if reaction time 
is slower for the inferior group, it is possible that they did not com- 
plete the achievement test. If so, the differential between applica- 
tion and information scores might not be indicative of lesser appli- 
cation ability. The application section was administered first, 
however, and was completed by all well within the time limit. 

Secondly, the inferior group may have been more cautious, thus 
omitting more items and lowering the application score. While this 
in itself might be of interest, an item count indicates that, with 
the exception of the one S discussed above, the poor appliers 
omitted less than half as many items per S as did the superior 
group. The possible significance of this result has not been explored. 

Perhaps some unconscious bias on the part of the investigator, 
and consequent slowing of his own reaction time occurred even 
though the investigator was unaware of the group to which a sub- 
ject belonged. Analysis of individual record sheets indicates that 
all but one of the inferior appliers had some words with latency 
comparable in brevity to those of the high appliers. Moreover, 
one member of the inferior group yielded a latency profile equiva- 
lent in brevity to those of the superior group except for one dis- 
turbed area. While not conclusive, these factors suggest the un- 
tenability of such an alternative. 


SUMMARY 


An attempt has been made to define ‘applicational transfer’ and 
to test the hypothesis that limited ability to apply principles in 
problem-solving is, in part, a function of the extent to which words 
have been inhibited or blocked. Fifteen Ss were assigned to two 
groups on the basis either of a large or a small differential between 
application scores and information scores on a previously validated 
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achievement test in psychology. The groups were shown to be 
approximately equated in certain other respects. With greater than 
normal latency on a controlled association test as the criterion of 
inhibition, superior appliers were found to have significantly shorter 
reaction time and fewer disturbed responses than inferior appliers. 
Alternative explanations are discussed. 

A relationship between variables belonging to different areas of 
investigation, applicational transfer and emotional disturbance, 


has been suggested. 
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RESPONSE STRENGTH IN A CLASSROOM 
TASK RELATED TO A ‘FORWARD’ 
DELAY IN REINFORCEMENT 


DONAVON AUBLE ann EDMUND MECH! 


Institute of Educational Research, Indiana University 


That a relation exists between degree of learning and delay of 
reinforcement is well known. However, most experimenters in 
this area have allowed the ‘response’ to be made prior to the 
administration of reinforcement. Thus, the gradient which is 
usually studied is the backward one; this means simply that the 
response precedes the reinforcement. However, the results of 
Muenzinger, Dove, and Bernstone (10), Jenkins (7, 8, 9), Thomp- 
son and Dove (/2), suggest that a gradient of delay also exists in a 
‘forward’ direction; that is, when reinforcement precedes the 
response. 

That a relation exists between amount of learning and immediacy 
of reinforcement is well known. However, in most studies of this 
relationship, the experimenter has allowed the ‘response’ to be 
made prior to the administration of reinforcement after a specified 
period of time. Thus, the gradient of delay which is usually studied 
is the backward one; this means simply that the response precedes 
the reinforcement. On the other hand, the results of Muenzinger, 
Dove, and Bernstone (10), Jenkins (7, 8, 9), Thompson and Dove 
(12), indicate that a gradient of delay appears to exist in a ‘for- 
ward’ direction; that is, with the reinforcement preceding the re- 
sponse. 

Since the counterpart of a reinforcing state of affairs preceding 
the S-R conjunction is apparent in many classroom situations, it 





‘ The authors wish to expressly thank both Dr. Wendell W. Wright, Dean, 
School of Education, Indiana University, and Dr. Nicholas Fattu, Director, 
Institute of Educational Research, Indiana University, for their support of 
basic experimentation. Our continuing appreciation is extended, also, to 
Mr. Homer Terrell, Principal of Central Elementary School, Martinsville, 
Indiana, who has not only granted permission for a series of classroom 
experiments, but, in addition, has materially aided in engendering a clearer 
understanding between psychologist and teacher, a trend, we might add, 
that could profitably be perpetuated. 


175 








176 The Journal of Educational Psychology 


appears desirable to extend our notions on this point at the human 
level. 

Hull (5) points out that the significance, if any, of ‘forward’ delay 
in reinforcement has not yet been carefully explored, but he is of 
the opinion that since the reduction of a need necessarily follows 
rather than precedes the act, it would seem that the ‘forward’ de- 
lay could hardly play an important rdle in selective learning. The 
problem, then, was to test the prediction that differential perform- 
ance would result from delayed and immediate reinforcement, when 
reinforcement precedes the response in a group situation. 


PROCEDURE 


The subjects were sixty-four third-grade pupils at Central 
Elementary School in Martinsville, Indiana. Groups I and II were 
composed of two independent third-grade classes. Day 1 was de- 
voted to obtaining an initial measure of ability for the two groups 
in a routine computational task. Group I was verbally reinforced 
on Days 2-5 immediately before being given the task, while Group 
II was required to wait one and one-half hours after receiving rein- 
forcement before the task was given. On Days 6-8, conditions were 
reversed. Group II received the task immediately after reinforce- 
ment, while the Group I delay between reinforcement and task 
was one and one-half hours. 

The reinforcement was similar to that used by Auble and Mech 
(1, 2) in previous classroom investigations. The teachers were re- 
quired to adhere to the following verbal reinforcement throughout 
the experimental sessions: ‘“You children are doing very nice work 
on the problems and are improving rapidly. Let me see how much 
better you can do today. Try to avoid any careless mistakes, and 
try to do as many problems as you can.” 

The assumption, of course, is that the verbal stimuli from the 
teacher, designated as belonging to a class ‘reinforcing’, will in- 
crease the probability that a particular class of response will occur 
more frequently than if these stimuli were withheld. 

The reinforcement was administered at approximately the same 
time each day. However, the task was either given immediately 
after, or delayed one and one-half hours in accordance with the 
operations previously described. 

All S’s were required to participate in the task as part of the class- 
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room routine, and, so far as can be determined, were niive with 
respect to the fact that an experiment was being carried out. 

The task was, in the main, a routine arithmetical one consisting 
of one-digit number combinations that were to be added or sub- 
tracted. Each 8 was allowed five minutes a day to work the prob- 
lems, there being enough problems, however, so that the most 
rapid subject was not able to finish before the time limit. Teachers 
were cautioned to avoid remedial work dealing with the task 
throughout the experiment. 


RESULTS AND DISCUSSION 


Figure 1 shows the cumulated transformed correct responses for 
iroups [ and II under both delay conditions. It should be noted 
that a square root transformation was applied to the raw data. 
This was done in accordance with a recommendation by Bartlett 
(3) because of the proportionality of the successive means and 
variances, as well as heterogeneity of variance. 

An analysis of covariance was carried out between the trans- 
formed Day 1 correct responses and the mean cumulated trans- 
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Fic. 1. Mean cumulated transformed correct responses for two groups 
of S8’s. On Days 2-5 Group I received no delay, while for Group II a one and 
one-half hour delay existed between reinforcement and task. Delay con- 
ditions for Days 6-8 were reversed for the two groups. (Group N = 32). 
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formed correct responses on Day 5. The obtained F value of 1.75 
was not statistically significant at the .05 level of confidence. The 
meaning of this insignificant value of F is that it indicates simply 
that the differences in the cumulated means of the groups of Day 
5 can be accounted for by differences in mean level of initial ability 
as measured in the pre-test trial on Day 1; for as Fisher (4) points 
out the means of the groups on Day 5 have been ‘adjusted’ by the 
analysis to a common mean initial level of performance on Day 1. 

Similarly, a covariance analysis was computed between the Day 
5 and Day 8 mean cumulated transformed correct responses. This 
operation was carried out in order to test for differences when delay 
in reinforcement was reversed from Group II to Group I. The ob- 
tained F value of 1.67 was again not statistically significant, indi- 
cating that whatever difference existed on Day 8 could not be at- 
tributed to the delay in reinforcement. A glance at Figure 1 shows 
that Group I maintained a relatively superior rank of correct 
responding over the seven sessions and the delay in reinforcement 
during Days 6-8 clearly did not exert any suppressing effects. 

A similar analysis was carried out for the error responses. Figure 
2 shows Group I and II errors under the two conditions of delay 
for a seven-day period. A covariance analysis between the initial 
measure and Day 5 indicated that the group difference was not 
significant (F < 1). Finally, the covariance F value of 3.20 that 
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S’s. On Days 2-5 Group I received no delay between reinforcement and 
task, while for Group II a one and one-half hour delay existed. On Days 
6-8 delay conditions were reversed for the groups. (Group N = 32). 
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was computed following the reversal of delay conditions (Days 
6-8) also proved insignificant. 

Unfortunately no classroom or group data exist, at present, with 
which to compare the cited results. The data, however, indicate 
quite clearly no differences existed between the groups that can be 
attributed to the delay between reinforcement and task. These 
data are not compatible with the results of previous investigations 
(10, 12) whieh utilized infra-human organisms and, especially so 
when compared with Jenkins (7, 8, 9) whose data strongly suggest 
that a gradient of delay exists when reinforcement precedes the 
response. A more specific comparison might be made between 
Hull’s (6) Corollary iii and the obtained data in the present study. 

Hull’s Corollary iii B states as follows: “The greater the delay 
in the receipt of the incentive by groups of learning subjects, the 
weaker will be the resulting learned reaction potential...” (6, p. 
132). 

Since Hull refers to his treatise as a general behavior system, 
not limited to infra-human organisms, the ‘deduction’ to be made 
from the above corollary is that if each subject on each trial under- 
goes all the delays of the experiment, performance or reaction po- 
tential should be weaker as delay is increased. The present empirical 
data clearly do not correspond to this prediction. Such a result, 
however, appears to bear out Hull’s statement that the corollary 
in question has yet to be verified in ‘detail’. 

In a recent review Stroud (11) makes a cogent case for pursuing 
a program of experimentation to clarify certain notions with re- 
spect to a behavior theory for education. Certainly animal experi- 
mentation and theory construction have made rapid progress dur- 
ing the last decade, presenting the educational psychologist with 
many concepts that appear to be testable under classroom condi- 
tions. One such concept has been tested in a classroom situation. 
Although the design was relatively simple, it should be noted that 
compatible results were obtained from the two independent class- 
room groups. As further, more intensive classroom experimentation 
develops and as results are verified by experimenters working in- 
dependently, it will not be too surprising to witness a revision of 
many current behavior ‘postulates’. There is no reason, then, why 
a behavior system for education should not basically develop from 
the crude data and problems of the classroom. 
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SUMMARY 


The present study began with the belief that progress could be 
made toward evolving a behavior system for education by virtue 
of testing certain concepts derived from learning theory. Specifi- 
cally, the question to be answered in the present experiment was: 
Would differential performance be obtained in a specified task 
under two varying conditions of delay in verbal reinforcement? The 
deduction to be made from existing experimental literature on the 
infra-human level is simply that as delay in reinforcement is in- 
creased response strength in a specified task should decrease. 

To test whether delayed reinforcement would have less advan- 
tage than immediate reinforcement in a classroom situation, the 
following procedure was used: Two independent third-grade class- 
room groups composed of thirty-two subjects each worked in a rou- 
tine task for five minutes a day for eight days. Day 1 was devoted 
to obtaining an initial measure of ability for the two groups. Group 
I was verbally reinforced by their teacher on Days 2—5 immediately 
before being given the task, while Group II was required to wait 
one and one-half hours after receiving reinforcement before the 
task was given. On Days 6-8, conditions were reversed. Group II 
received the task immediately after reinforcement, while the 
Group I delay between reinforcement and task was one and one- 
half hours. 

The covariance analyses for both error and correct responses 
indicated that no statistically significant differences existed be- 
tween groups throughout the experiment that could be attributed 
to the delay between reinforcement and task. The results clearly 
were not compatible with a prediction from a theoretical model 
that was based largely on infra-human data. The study, however, 
is indicative of a future program of intensive classroom experi- 
mentation that if pursued vigorously could contribute materially 
to the construction of a theoretical learning framework for educa- 
tion. 
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BOOK REVIEWS 


HELEN M. Rosinson, Ed. Clinical Studies in Reading II. Suppl. 
Educ. Monog. No. 77. Chicago: University of Chicago Press, 
1953, pp. 189. $3.75. 


In this second report from the Reading Clinic of the University 
of Chicago, emphasis is placed upon visual problems related to 
difficulties in learning to read. The first part deals with treatment 
of poor readers with vision problems. Included are discussions of 
types of problems encountered, visual screening tests used, pro- 
cedures for referral, and teaching adaptations. It appears that 
visual difficulties interfere with adequate reading progress in in- 
dividual cases. The incidence of visual problems among poor readers 
is sufficiently high to warrant visual testing of all disability cases 
to identify cases in need of examination by professional refraction- 
ists. Results of such examinations provide information useful in 
remedial teaching. 

Four reports of staff research are given in Part II. These reports 
are concerned with the factors of visual efficiency, personality 
adjustment, eye-hand preference and reversals, and prediction of 
increase in reading rate in relation to progress in learning to read. 
In general, intimate relationship between these factors and reading 
progress was absent. However, pertinent evaluation of the experi- 
mental techniques employed were made with suggestions for fur- 
ther research. 

In Part ITI appear four reports of research by graduate students. 
Important findings include: (1) A combination of results from 
visual screening tests plus teacher observation of symptoms provide 
a better basis of referral to a refractionist than either alone. (2) 
Girls tend to excel boys in reading through the first three or four 
grades only. (3) Although auditory acuity and auditory discrimi- 
nation do not appear to be widespread causes of inefficient word 
recognition, there is a suggestion that auditory memory span is 
important. (4) Important factors in leisure reading by retarded 
readers include desire to read, availability of books of proper in- 
terest and easy vocabulary, short books, and a scheduled time for 


free reading. 
Part IV consists of five papers by leaders in the field on vision 
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and reading delivered at a conference on the subject. Jobe’s paper 
is an excellent discussion of the structure and function of the visual 
mechanism. Gesell emphasizes certain developmental factors in 
vision and reading. The visual problems of poor readers receive 
adequate attention in an analysis and evaluation by Eames. In 
her discussion of visual training and reading, Eberl emphasizes the 
need for a complete case study of all visual functions as well as a 
visual examination for disabled readers. In certain cases, visual 
training should supplement remedial teaching to restructure and 
build the most reliable visual behavior pattern possible. Such train- 
ing consists of ‘teaching to see’ rather than just ‘muscle exercise’. 
This approach is not clearly understood by many clinicians. 

The editor of this monograph points out that the series of experi- 
mental reports and discussions do not provide final answers to the 
problems considered. However, the contributions are important in 
advancing our knowledge of the relation of visual defects to read- 
ing progress, in evaluation of experimental and measurement tech- 
niques, in posing additional problems for investigation, and in 
pointing out implications of the findings. All workers in remedial 
reading will profit by careful reading of this monograph. 

University of Minnesota Mites A. TINKER 


HaroLtp W. Bernarp. Mental Hygiene for Classroom Teachers. 
New York: McGraw Hill Book Company, 1952, pp. 472. 


“This volume is an outgrowth of fifteen years of experience in 
teaching courses relating mental hygiene to education.” (p. vii). 
The book is in four parts. Part 1, Basic Considerations, discusses 
the need for the mental hygiene point of view; human needs and 
mental health; the nature of maladjustment; meeting the needs 
of children; and special needs of adolescents. 

Part 2, Mental Hygiene in the Classroom, deals with teacher 
personality and pupil behavior; understanding and helping children 
with problems; the mental hygiene of discipline; teaching methods 
and pupil adjustment; some questionable school practices; per- 
sonality problems in the classroom; and constructive classroom 
approaches to mental health. 

Part 3, Special Approaches to Mental Health, includes art as an 
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approach to mental health; writing for understanding and release; 
using drama and play; and limitations and precautions regarding 
mental hygiene. 

Part 4, The Teacher’s Mental Health, discusses a positive view 
of the teaching profession; the teacher’s philosophy—in the school; 
and the teacher’s philosophy—adult mental hygiene. 

“Emphasis is placed on the normal child and on mental hygiene 
in the classroom rather than through the use of clinics, formal 
guidance services, etc.” 

“Improvement in educational practices is based on willingness 
to try theoretically sound methods in spite of beliefs that present 
techniques are efficacious. But such experimentation should be 
based on a consistent point of view—there should be a frame of 
reference. The outlining of such a frame of reference is the task 
that the writer has undertaken.” (p. vii). Apparently mental hy- 
giene is the frame of reference. There are techniques that are only 
for the use of experts. The writer is concerned in this volume with 
those which should be known and used by teachers. 

This is not a fundamental text in mental hygiene but an attempt 
to show how mental hygiene can help in the work of the school. 
It is thus primarily a book on education. Wallin is not mentioned, 
although Shaffer (1936) is quoted. References are especially to pub- 
lications concerning teaching methods and teaching aids. 

The spread of subject matter can be seen in subjects discussed 
or references cited: teaching how to write; relations of adminis- 
trators to teachers; problem of aging; grading systems; need for 
exercise; learning; projective technique; use of test data; ventila- 
tion control; visual aids; therapy through art; sociometry and 
pupil adjustment; ‘experimental approaches’ which seem to be 
‘readily available to ordinary classroom teachers’.’”’ Psychodrama 
and sociodrama (Sociodrama is an extension of psychodrama) (p. 
343) are discussed and recommended for the use of the classroom 
teacher. 

The statements, ‘“There is a correct way to rear a child,” and 
“There is a correct way to teach,” (pp. 375, 376) lead us to wonder 
if these have been discovered; but also leave the suspicion that 
probably there are several ways for each task, and that different 
ways might be necessary to deal with individual differences. The 
latter are mentioned, grudgingly, in two paragraphs (p. 146). 

The impression given by the book is that many good suggestions 
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are given for teachers; that special techniques are assumed to be 
available to teachers but with questionable foundation and train- 
ing; that there are dogmatic and imperative statements, not all of 
which can be accepted; some of these are constantly repeated 
statements found in pedagogical literature, and, though trite, some 
of them may bear repeating. 

The purpose and attitude of the book are good; the style is 
simple and clear; some terms are used loosely, e. g., experimenta- 
tion and experimental; the spread of subject matter suggests the 
survey course which gives orientation but which does not deal 
thoroughly with any one subject. A. 8. EDWARDS 

The University of Georgia 


OxriveR L. Lacey. Statistical Methods in Experimentation. New 
York: The Macmillan Company, 1953, pp. 249. 


This little text should prove to be an excellent one for a first 
course in statistics. It touches upon experimental design, probabil- 
ity and the normal distribution, tests of significance of means and 
of differences between means, Chi-square, correlation and regres- 
sion, and fiducial limits. Assumptions underlying each statistical 
technique are explicitly stated and justified verbally. The beginner 
is not confused by extensive derivation of formulae, an approach 
which is quite desirable wherever classes include many persons with 
little or no background in mathematics. Such topics as sampling 
theory, and the differences between one-tailed and two-tailed tests 
of significance, for example, are introduced easily and simply in a 
way which should provide a good foundation for later expansion. 
The clarity of exposition is remarkably good. One might wish to 
complement the discussion of small sample theory, and perhaps 
applications of the Chi-square test, if one’s class moved fast enough 
to permit it. 

Many examples and problems are included. Usually these are 
at the ends of chapters, but some are within the chapters, close to 
pertinent discussion. These are in addition to the illustrative ex- 
amples used by Lacey in his discussion. The examples are to be 
worked by the student, and there are answers given for these, 
along with some very good explanatory material for each item. 
Problems follow, involving the operations just discussed in the 
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chapter and illustrated and explained in the examples. For the 
convenience of the instructor, answers ought perhaps to be pro- 
vided somewhere for these also, although they may be easily 
worked out in a short time. The examples and problems are nicely 
distributed over several kinds of research studies, so that applica- 
tions of given techniques to various fields are well illustrated. 

Several tables are included, not all in the form usually encoun- 
tered in similar texts. Table B, for example, is an abbreviated 
version of the familiar table of percentages of total area under 
the normal curve between the mean ordinate and other ordinates. 
It would require the student to interpolate more frequently than 
with the fuller table, which might provide an additional hazard 
for the beginner. 

As one who has taught elementary statistics for several years to 
students who have almost no background for the course, the re- 
viewer believes this book comes as close as any he has seen to being 
an ideal teaching text for that kind of class. It may be considered 
too limited in scope by those who have more advanced classes, or 
who wish their courses to cover more ground. For students in 
psychology, education, sociology, and related fields who are limited 
in background, however, this text should provide an excellent in- 
troduction to statistics and experimental design which involves 
very little suffering. Ceci M. FREEBURNE 

Bowling Green (Ohio) State University 


CHARLES Kocn. The Tree Test. New York: Grune and Stratton, 
1952, pp. 87. 


In this book are presented the interpretations which the author 
has found applicable to elements of the form, location, and grapho- 
logical features of drawings of trees, used as a projective technique. 
For this purpose the subject is requested ‘to draw a fruit tree’. 
The author further suggests that the subject draw additional trees 
‘if one has the impression that the first drawing obtained does not 
correspond with the person tested or that the drawing differen- 
tiates too little.”” He claims that such successive productions tap 
different layers of the psyche. 

The introduction, which comprises the first half of the book, 
consists of a lyrical and imaginative treatise on the symbolism of 
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the tree followed by several illustrative analyses of test subjects. 
With regard to the validity of his interpretations the author states 
only that they have been proved correct ‘‘(1) by the nature of the 
recorded expression, (2) by comparative investigations, by direct 
observations, by testing and by judgements of independent per- 
sons.” The test has been employed mainly in the vocational guid- 
ance of adolescents and adults, and the author points out with 
proper caution‘that additional samples would be needed to extend 
the interpretation of tree drawings to special groups such as the 
mentally deficient or the mentally ill. He also suggests that the 
test be used in conjunction with other tests for diagnosis and 
counseling so that the findings may check and augment each 
other. It is unfortunate that the author does not see fit to extend 
the same cautious and critical attitude toward the interpretations 
themselves. 

The only objective data contained in the book are tables pre- 
senting the developmental trends for various tree-drawing charac- 
teristics in the age range from five to sixteen years, and a com- 
parison between two groups of the same age but different mental 
endowment. Unfortunately, all statistics are presented as_per- 
centages with no indication of the sizes of the samples studied. 

The second half of the book contains clear and detailed illus- 
trations of possible variations in the drawings of root, trunk, and 
crown together with lists of interpretations for each detail. The 
many different interpretations suggested for any one drawing 
characteristic should give the psychodiagnostician ample leeway 
for employing his imagination in producing a rich character por- 
trayal of the subject. However, being a skeptic with regard to the 
principles of graphological interpretation and the “scientific anal- 
ysis of forms of expression,” this reviewer is inclined to wish for 
less fantasy and more fact. 

This book may have some value for those who are particularly 
interested in drawing as a projective technique. It would be quite 
interesting to compare the interpretations given here with those 
offered for the tree drawings on the HTP test. It might also be of 
interest to determine whether the same developmental trends ob- 
tain in this country as those found by the author for Swiss children. 

GOLDINE C. GLESER 


Washington University School of Medicine 
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JAMES L. Murseuu. Using Your Mind Effectively. New York: 
McGraw-Hill Book Company, 1951, pp. 254. 


In spite of the more general implication of the title and the 
author’s protestation against confining the concept of mental effec- 
tiveness to schclastic efficiency, Using Your Mind Effectively can 
be described best as a ‘how-to-study’ book written in popular style. 
The theme upheld throughout the volume is that the achievement 
of better thinking is the requisitie for attaining greater adeptness 
in studying. Since the desideratum of better thinking is learnable 
and teachable, Mursell’s task is to show the way to the reader. 
The steps in reaching comprehension and understanding are for- 
mulated as: (1) obtaining an over-all view of the material under 
consideration, (2) identifying its main features, and (3) working 
the details into their proper relationship to the main skein. 

After an introductory section on the value of mental efficiency, 
the book is broken down into three main divisions; one dealing 
with an exposition of the general psychological principles to be 
observed in attaining intellectual mastery and two devoted to the 
demonstration of practical applications. In Part One the author 
elaborates on the nature and usefulness of the sequence; picking up 
the main thread, identifying essentials, and relating the details to 
the whole. He discusses the importance of this sequence in the 
reading of textbooks, study of college courses, and the writing of 
term papers and theses. The construction of a mental map into 
which essential parts and details may be fixed is offered as the 
modus operandi for achieving mental excellence. Applications of 
this technique to non-academic situations are demonstrated, e.g., 
preparing a talk, orienting oneself to a new job, arranging one’s 
budget, and learning to play tennis. 

In Part Two standard topics of the ‘how-to-study’ books such as 
budgeting time, note-taking and note-using, self-testing, and con- 
centration are discussed under the rubric of working tools and 
practical plans. The general rationale for the recommended ap- 
proaches as well as the minutiae of procedural detail is set forth. 
Much of this is the stock advice on these topics of ‘how-to-study’ 
books. The author appears at times to be self-conscious about some 
of the trivia which he presents. After marshalling a lengthy list of 
materials and equipment that should be kept in the study room, 
he feels constrained to remark that he offers this for whatever it is 
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worth. In this division of the book the leitmotif is that the facilita- 
tion of thinking should be the governing consideration in the 
arrangement of time, conditions, and methods of work. Everything 
should be instrumental to the goal of encouraging thinking; note- 
taking and review are seen as experiences in thinking. 

In Part Three under the title “Some More Extended Applica- 
tions” appear the following topics: memorizing, reading, writing 
term papers and theses, and creative thinking. Good memorization 
is viewed as conditional upon proper purposing, understanding, and 
noticing. Analyzing, interpreting, and thinking are considered the 
crux of the development of reading skill. In writing term papers, 
it is recommended that the outline and over-all plan be completed 
before a single word of the final manuscript is written. In a very 
short chapter on creative thinking, Mursell makes the point that 
whenever one grasps the meaning of another person’s writing, one 
is engaged in creative thinking. The author makes clear that his 
aim in offering suggestions is not to facilitate the functioning of 
routine-minded workers, but to influence the intellectual worker 
to approach even mundane tasks as invitations to creative think- 
ing. He concludes his work with the statement that he has tried 
to make everything in the book center on creative thinking. 

After finishing this book, the reader might legitimately ask what 
progress psychology has made, since William James, in offering 
assistance to students in their pursuit of academic success. No 
later writer seems able to go beyond or even approach the level of 
James’ Talks to Teachers on Psychology: and to Students on Some 
of Life’s Ideals published in 1899. Basically little that is new since 
Kitson’s How To Use Your Mind appears evident in the more 
recent books on ‘how to study’. Aside from the Q 3 R technique 
of Ohio State University for studying chapters in the convention- 
ally organized textbook and the ‘push on’ or quick reading method 
for mastering foreign languages, there is in the book under review 
scarcely any preference to new developments in effective study 
methods. Psychologists who choose to write ‘how-to-study’ books 
seem to feel obliged to approach their topics as exclusively cog- 
nitive-intellective matters unrelated to the emotions and the total 
personality. They appear to be oblivious to the psychoanalytic 
movement and to dynamic psychology in general. Mursell fails to 
suggest that the inability to understand and think effectively may 
be due to emotional factors as much as to poor techniuges in 
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cognition. The failure to integrate cognitive functions with the 
entirety of the personality accounts for the antediluvian character 
of most ‘how-to-study’ books. 

Mursell has done a good job of expounding the traditional ap- 
proach for developing mental efficiency. However, at times, the 
style is repetitious, long-winded, and too general without the com- 
pensation of being inspirational—hence boring. The standard of 
writing at times seems to dip below the college level to which it 
is apparently directed. The attempt to offer applications to non- 
academic pursuits appears to be simply a gesture. For those who 
want a simple and very clear exposition of the conventional views 
on effective use of the mind, this book may be most satisfying. 

Puitiep M. Kitay 


Adelphi College 


C. W. OpELL, How To Improve Classroom Testing. Dubuque, Iowa: 
William C. Brown Company, 1953, pp. 156. 


Professor Odell has prepared a student manual that emphasizes 
the practical and non-technical aspects of the construction and 
administration of informal or teacher-made tests of achievement. 
Students of education will be pleased to find that he sets the prob- 
lem of measuring achievement in the context of curriculum de- 
velopment by taking the objectives of education as the definitions 
of the achievements that are desired. He then concentrates on 
testing programs and types of tests, excluding problems of intel- 
ligence and personality measurement. Chapters IV through XI] 
are devoted to test construction principles and illustrations of 
various types of test items or problems; these illustrations are 
drawn from a number of subject matter fields. The manual appears 
to be a good source of ideas for test item types, and as such it 
should be of value to teachers. One chapter on administration and 
scoring and one on statistical methods in connection with testing 
complete the volume. The chief contribution of the manual is its 
rather extensive and practical advice on how to develop and use 
informal tests. CHESTER W. Harris 

The University of Wisconsin 
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PHYLLIS GREENACRE. T'rauma, Growth and Personality. New York: 
W. W. Norton and Co., 1952, pp. 328. 


In the past decade Greenacre has published many psychoana- 
lytically oriented studies of early personality development. These 
have appeared in many journals; most of them in the American 
Journal of Orthopsychiatry or the Psychoanalytic Quarterly. The 
main purpose pf this volume is to make a selection from these 
readily accessible to more readers. The connecting link for the 
selection included is that they are all related to the interaction of 
maturation phases and special traumas in the first few years of life. 

Studies included in this volume are The Biological Economy of 
Birth; The Predisposition to Anxiety; Infant Reactions to Re- 
straint; Urination and Weeping; Vision, Headache, and the Halo; 
Anatomical Structure and Superego Development; Conscience in 
the Psychopath; A Study of Screen Memories; The Prepuberty 
Trauma in Girls; Problems of Early Female Sexual Development; 
Respiratory Incorporation and the Phallic Phase; and Some Fac- 
tors Producing Different Types of Genital and Pregenital Organi- 
zation. The volume contains a very favorable and informative 
introduction by Dr. Ernst Kris, a list of references for each chapter 
in the back end of the volume, and an index. 

The studies will meet the unreserved approval of that school of 
psychoanalysts and clinicians who have an interest in and include 
in their studies social as well as individual factors. This group will 
probably react to the content of the volume and the interpretation 
pretty much the same way as Kris does in his introduction. Kris 
sees Greenacre’s studies as a valid elaboration of Freud’s concept 
of the trauma which enables Greenacre to enlarge her vista beyond 
the range of psychological events and to reémphasize the continuity 
that links earliest physiological reactions of the organism to its 
gradually differentiating psychological experiences. Like him, too, 
they probably would consider the volume as composed of studies 
which are evidence of Greenacre’s “happy and productive fusion 
of observational and speculative writing.’’ Experimental psycholo- 
gists obviously would be less satisfied. In these studies they will 
find no controlled investigations and nothing treated statistically. 
They will instead find clinical and observational reporting. How- 
ever, in all fairness they will have to admit that Greenacre, though 
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primarily an analytic clinician, shows a clear familiarity with and 
knowledge of contributions from related fields as well as psycho- 
analysis. For example, on the physiological level she shows famili- 
arity with the works of Cannon, Morgan, and Coghill. On the 
psychological level she shows familiarity with the writings of such 
contributors as Gesell and Halverson, Mandell Sherman, Sontag, 
and Jersild. H. MELTZER 

Psychological Service Center 

St. Louis, Missouri 











